Solved – Why is the penalty term added instead of subtracting it from loss term in regularization

Why is the penalty term $R(f)$ added to a general loss function in regularization instead of subtracting?

For example,
$$
mathrm{argmin} sum L(theta,hattheta)+ lambda R(f) ?
$$

Let me start with the concept of Regularization. Regularization is means to avoid high variance in model (also known as overfitting). High variance means that your model is actually following all noise and errors in the data. The model is not at all flexible. Since the idea is to control complexity, we want to penalize the model for overfitting.

The parameters of a model are decided based on the cost function of the model. The best model will have minimum cost. Let me take the example of linear regularization.

Cost function and parameters(theta) of Linear Model without Regularization: enter image description here

Cost function and parameters(theta) of Linear Model with Regularization: enter image description here

So, by using regularization, the parameters are penalized for over fitting. (regularized term is subtracted from the parameter to minimize the cost function)

Similar Posts:

Rate this post

Leave a Comment