Solved – If L2 regularization parameter is high and learning rate low, can cost of cross entropy loss function increase

I coded a neural network from scratch. When regularization parameter is too high and the learning rate too low, cost increases. I suspect that the added cost (associated with regularization) to loss function is responsible for this. When the regularization parameter is set to zero I always get a nice decrease in cost function. Can you please explain what is happening?

$L_2$ regularization is basically adding a parabola with a minimum at the origin to the loss surface. How steeply the parabola rises depends on the magnitude of the $L_2$ penalty. If the penalty is too large, then the effect of regularization will overwhelm the signal from the cross-entropy loss, because the shape of the surface is so distorted by the massive penalty for increasing the norm of the weights from 0.

If you imagine starting in some vicinity near zero, moving further from zero will increase the magnitude of the penalty dramatically. If the increase in the penalty is larger than the decrease in cross-entropy loss, then the net effect is that the total loss will increase.

This is one of the reasons that I prefer to track the total, penalized loss separately from the penalty and the classification loss. Basically, track all of these quantities separately to make this kind of unusual behavior obvious.

Similar Posts:

Rate this post

Leave a Comment