How to interpret “loss” and “accuracy” for a machine learning model

Question

How to interpret “loss” and “accuracy” for a machine learning model

When I trained my neural network using Theano or Tensorflow, they will report a variable called “loss” over an era.

How to interpret this variable? Are higher losses better or worse, or what does this mean for the ultimate performance (accuracy) of my neural network?

+152

deep-learning machine-learning neural-network mathematical-optimization objective-function

mamatv Dec 29 '15 at 20:33

source share

3 answers

These are two different metrics for evaluating the performance of your model, which are usually used at different stages.

Losses are often used in the training process to find the “best” parameter values for your model (for example, weights in a neural network). This is what you are trying to optimize in training by updating weights.

Accuracy is more practical. Once you find the optimized parameters above, you use these metrics to evaluate how accurate your model's forecast compares with the true data.

Let's use the toy classification example. You want to predict gender from one weight and height. You have 3 data, they are as follows: (0 stands for man, 1 stands for woman)

y1 = 0, x1_w = 50 kg, x2_h = 160 cm;

y2 = 0, x2_w = 60 kg, x2_h = 170 cm;

y3 = 1, x3_w = 55 kg, x3_h = 175 cm;

You are using a simple logistic regression model: y = 1 / (1 + exp- (b1 * x_w + b2 * x_h))

How do you find b1 and b2? you first determine the losses and use the optimization method to minimize losses in an iterative way by updating b1 and b2.

In our example, a typical loss for this binary classification task can be: (a minus sign must be added before the summation sign)

We do not know what b1 and b2 should be. Let's make a random assumption, say b1 = 0.1 and b2 = -0.03. Then what is our loss now?

$\hat{y}_1 = \frac{1}{ 1 + e^{ -(0.1 \cdot 50 - 0.03 \cdot 160) } } = 0.549834 = 0.55$

$\hat{y}_2 = \frac{1}{ 1 + e^{ -(0.1 \cdot 60 - 0.03 \cdot 170) } } = 0.7109495 = 0.71$

$\hat{y}_3 = \frac{1}{ 1 + e^{ -(0.1 \cdot 55 - 0.03 \cdot 175) } } = 0.5621765 = 0.56$

so the loss

$-\log(1-0.55) -\log(1-0.71) - \log(0.56) \simeq 2.6162$

You will then study an algorithm (e.g. gradient descent) to find a way to update b1 and b2 to reduce losses.

What if b1 = 0.1 and b2 = -0.03 - these are the last b1 and b2 (gradient descent output), what is the accuracy now?

Suppose that if y_hat> = 0.5, we decide that our prognosis is female (1). otherwise it will be 0. Therefore, our algorithm predicts y1 = 1, y2 = 1 and y3 = 1. What is our accuracy? We make the wrong forecast for y1 and y2 and make the correct forecast for y3. So now our accuracy is 1/3 = 33.33%

PS: in Amir’s answer, back propagation is called an optimization method in NN. I think this will be seen as a way to find the gradient for weights in NN. A common optimization method in NN is GradientDescent and Adam.

+14

Math Novice Oct 17 '17 at 22:46

source share

@Aadnan Just to refine the Training / Validation / Test datasets: The training kit is used to perform initial model preparation, initialization of neural network weights.

A set of validations is used after training the neural network. It is used to configure network hyperparameters and compare how changes in them affect the accuracy of model prediction. While the training set can be considered as used to build the weights of the gates of the neural network, the validation set allows you to fine-tune the parameters or architecture of the neural network model. This is useful because it allows you to repeat the comparison of these different parameters / architectures with the same data and network weights in order to observe how changes in the parameters / architecture affect the intelligent power of the network.

Then, the test set is used only to verify the predictive accuracy of the trained neural network from previously invisible data after training and selecting parameters / architecture using training and verification data sets.

+4

Jon Nov 01 '17 at 14:46

source share

Amir · Accepted Answer · 2015-12-29 21:21

The lower the loss, the better model (if the model has not switched to training data). Loss is calculated on training and validation , and its interaction is how well the model works for these two sets. Unlike accuracy, loss is not a percentage. This is a summation of the mistakes made for each example in training or verification sets.

In the case of neural networks, the loss is usually the negative logarithmic probability and the residual sum of squares for classification and regression, respectively. Naturally, the main goal of the training model is to reduce (minimize) the value of the loss function with respect to the model parameters by changing the values of the weight vector using various optimization methods, such as backpropagation in neural networks.

The loss value means how good or bad a model behaves after each optimization iteration. Ideally, one can expect a decrease in losses after each or several iterations (s).

The accuracy of the model is usually determined after the parameters of the model are studied and fixed, and training does not occur. Then the test samples are submitted to the model, and the number of errors (zero-1 loss) that the model makes are recorded after comparison with the true goals. Then, the percentage of erroneous classification is calculated.

For example, if the number of test samples is 1000, and the model correctly classifies 952 of them, then the accuracy of the model is 95.2%.

There are also some subtleties when reducing the value of losses. For example, you may encounter a reinstallation problem in which the model “remembers” training examples and becomes ineffective to set for the test. Redefinition also occurs when you do not use regularization , you have a very complex model (the number of free parameters W large), or the number of data points N very small.

How to interpret “loss” and “accuracy” for a machine learning model

More articles: