hyperparameter
Like every machine learning model, CNNs contain many formulas with many parameters. During
training the model learns from the data in the sense of optimizing the parameters. However, such models
can have other, additional parameters, which are not directly learned during the regular training. These
parameters have values set before starting the training. We refer to this last type of parameters as hyperparameters
in order to distinguish them from the network parameters that are optimized during training. Or
from another point of view, hyperparameters are solver-specific parameters.
Prominent examples are the initial learning rate or the batch size.
inference phase
The inference phase is the stage when a trained network is applied to predict (infer) instances
(which can be the total input image or just a part of it) and eventually their localization. Unlike during the
training phase, the network is not changed anymore in the inference phase.
intersection over union
The intersection over union (IoU) is a measure to quantify the overlap of two areas. We
can determine the parts common in both areas, the intersection, as well as the united areas, the union. The
IoU is the ratio between the two areas intersection and union.
The application of this concept may differ between the methods.
label
Labels are arbitrary strings used to define the class of an image. In HALCON these labels are given by the
image name (eventually followed by a combination of underscore and digits) or by the folder name, e.g.,
’apple_01.png’, ’pear.png’, ’peach/01.png’.
layer and hidden layer
A layer is a building block in a neural network, thus performing specific tasks (e.g., convolution,
pooling, etc., for further details we refer to the “Solution Guide on Classification”).
It can be seen as a container, which receives weighted input, transforms it, and returns the output to the next
layer. Input and output layers are connected to the dataset, i.e., the images or the labels, respectively. All
layers in between are called hidden layers.
learning rate - hyperparameter
’learning_rate’ The learning rate is the weighting, with which the gradient (see
the entry for the stochastic gradient descent SGD) is considered when updating the arguments of the loss
function. In simple words, when we want to optimize a function, the gradient tells us the direction in which
we shall optimize and the learning rate determines how far along this direction we step.
Alternative names: , step size
level
The term level is used to denote within a feature pyramid network the whole group of layers, whose feature
maps have the same width and height. Thereby the input image represents level 0.
loss
A loss function compares the prediction from the network with the given information, what it should find in
the image (and, if applicable, also where), and penalizes deviations. This loss function is the function we
optimize during the training process to adapt the network to a specific task.
Alternative names: objective function, cost function, utility function
momentum - hyperparameter ’momentum’ The momentum 2 [0; 1) is used for the optimization of the loss
function arguments. When the loss function arguments are updated (after having calculated the gradient), a
fraction of the previous update vector (of the past iteration step) is added. This has the effect of damping
oscillations. We refer to the hyperparameter as momentum. When is set to 0, the momentum method has
no influence. In simple words, when we update the loss function arguments, we still remember the step we
did for the last update. Now we go a step in direction of the gradient with a length according to the learning
rate and additionally we repeat the step we did last time, but this time only times as long.
non-maximum suppression
In object detection, non-maximum suppression is used to suppress overlapping predicted
bounding boxes. When different instances overlap more than a given threshold value, only the one
with the highest confidence value is kept while the other instances, not having the maximum confidence
value, are suppressed.
overfitting
Overfitting happens when the network starts to ’memorize’ training data instead of learning how to
find general rules for the classification. This becomes visible when the model continues to minimize error on
the training set but the error on the validation set increases. Since most neural networks have a huge amount
of weights, these networks are particularly prone to overfitting.
|