Why is the penalty term $R(f)$ added to a general loss function in regularization instead of subtracting?
For example,
$$
mathrm{argmin} sum L(theta,hattheta)+ lambda R(f) ?
$$
Best Answer
Let me start with the concept of Regularization. Regularization is means to avoid high variance in model (also known as overfitting). High variance means that your model is actually following all noise and errors in the data. The model is not at all flexible. Since the idea is to control complexity, we want to penalize the model for overfitting.
The parameters of a model are decided based on the cost function of the model. The best model will have minimum cost. Let me take the example of linear regularization.
Cost function and parameters(theta) of Linear Model without Regularization:
Cost function and parameters(theta) of Linear Model with Regularization:
So, by using regularization, the parameters are penalized for over fitting. (regularized term is subtracted from the parameter to minimize the cost function)
Similar Posts:
- Solved – Why is the regularization term *added* to the cost function (instead of multiplied etc.)
- Solved – Why is the regularization term *added* to the cost function (instead of multiplied etc.)
- Solved – If L2 regularization parameter is high and learning rate low, can cost of cross entropy loss function increase
- Solved – L2 Regularization Constant
- Solved – Why do we need the regularization term for NMF but not for SVD