Solved – Why does the accuracy not change, when applying different alpha values in L2 regularisation

Figure below shows the accuracy using different alpha values in L2 regularisation. As long as alpha
is small in the range of $10^{-12}$ to $10^{-2}$ the accuracy remain the same. I do undarstand when alpha value is $10^{1}$ or greater it will increase the weights to a point where they do not fit the data optimal and then, resulting in under-fitting. What is the reason why the accuracy remain the same by smaller alpha values? The formula is: $$w^{2}=w^{1}-alpha*w^{1}$$

where $w^{1}$ and $w^{2}$ are respectively the recent and the regularised weight parameters, $alpha$ is the regularisation parameter specifying the amount of regularisation.

Applying different alpha values in L2 regularisation

The various loss functions reflect different answers to the question What makes a model "good"? Choosing one loss function over another is implicitly choosing one interpretation of "model goodness" over another.

In the case of inaccuracy, the loss function just checks whether the largest predicted value matches the target class, and the loss is the fraction incorrectly predicted.

Cross-entropy loss awards lower loss to predictions which are closer to the class label.

The difference between cross-entropy loss and inaccuracy is the difference between taking a course pass/fail or for a grade. A pass/fail rubric just tells you whether minimum requirements were satisfied. A letter grade tells you how well a student performed. Thus, the inaccuracy score is concealing the effect of the $alpha$ parameter: the predicted values for the observations are almost certainly changing at different values $alpha$, but the relative ranking of the scores isn't, so (in)accuracy is flat.

For illustration, consider the following results

observation$alpha_1$$alpha_2$label
10.010.490
20.990.511

These two models will obviously have the same (in)accuracy for this data, but the cross-entropy loss will be completely different.

The inaccuracy is obviously $0.0$ for both $alpha_1$ and $alpha_2$, using the rule that argmax of the predictions is the predicted class. Moreover, since the inaccuracy is bounded below by $0.0$, it is impossible to improve upon either model when the models are compared on the basis of inaccuracy loss.

However, the cross-entropy loss in the case of $alpha_1$ is $-log(0.99)-log(1-0.01)=-2log(0.99)approx 0.02.$

For $alpha_2$, the cross-entropy loss is $-log(0.51)-log(1-0.49)=-2log(0.51)approx 1.35$. This shows that a metric which is sensitive to the degree of precision of the predictions is more informative than one which is not, in the sense that more (less) confidence about the correct class is reflected in the loss.

Stated another way, the (in)accuracy of $alpha_1$ and $alpha_2$ is the same, so inaccuracy loss on its own is not sufficient to distinguish between the two models.

By contrast, the cross entropy loss is completely different: $alpha_1$ is the better model according to cross-entropy loss, because its loss is lower than the loss for $alpha_2$.

Similar Posts:

Rate this post

Leave a Comment