Why increasing lambda parameter in L2-regularization makes the co-efficient values converge to zero?
I have just tried to do the math, but it's a little bit rusted.
Lets say that we have a simple linear model as follows:
$y=w_1cdot x$
we could write the cost function for ridge regression is to be minimized:
$cost(hat{w_1}, lambda)= (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$
it means that if we consider the problem as min-max:
$frac{hat{dw_1}}{dc} = -2 cdot x cdot (y – hat{w_1}) + 2cdot lambda cdot hat{w_1} = 0$ so,
$y = (1 + frac{lambda}{x}) cdot hat{w_1}$
Since the y and x are invariants, it is to be expected increasing $lambda$ make the co-efficient decrease as the equation holds.
Is that the right way to reason?
Best Answer
Yep, that is one way to think about it, although it seems a tad obscure to me.
I think it's simpler to just look at your $text{cost}$ equation:
$text{cost}(hat{w_1}, lambda) = (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$
We can see from this that, for large $lambda$, our cost increases quadratically with the absolute size of $hat{w_1}$. That is, we are penalising our model for having a large weight: thus to reduce the cost, our $hat{w_1}$ coefficient is shrunk towards zero.
If $lambda$ is small, or zero, this second term doesn't really affect the cost, so $hat{w_1}$ is free to grow as large as it needs to, to minimise the other component of the cost function.
Similar Posts:
- Solved – Showing that ridge regression is a solution to the following optimization problem
- Solved – Showing that ridge regression is a solution to the following optimization problem
- Solved – Computing variance from moment generating function of exponential distribution
- Solved – Lasso regression solutions
- Solved – Lasso regression solutions