Why increasing lambda parameter in L2-regularization makes the co-efficient values converge to zero?

I have just tried to do the math, but it's a little bit rusted.

Lets say that we have a simple linear model as follows:

$y=w_1cdot x$

we could write the cost function for ridge regression is to be minimized:

$cost(hat{w_1}, lambda)= (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$

it means that if we consider the problem as min-max:

$frac{hat{dw_1}}{dc} = -2 cdot x cdot (y – hat{w_1}) + 2cdot lambda cdot hat{w_1} = 0$ so,

$y = (1 + frac{lambda}{x}) cdot hat{w_1}$

Since the y and x are invariants, it is to be expected increasing $lambda$ make the co-efficient decrease as the equation holds.

Is that the right way to reason?

**Contents**hide

#### Best Answer

Yep, that is one way to think about it, although it seems a tad obscure to me.

I think it's simpler to just look at your $text{cost}$ equation:

$text{cost}(hat{w_1}, lambda) = (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$

We can see from this that, for large $lambda$, our cost increases quadratically with the absolute size of $hat{w_1}$. That is, we are penalising our model for having a large weight: thus to reduce the cost, our $hat{w_1}$ coefficient is shrunk towards zero.

If $lambda$ is small, or zero, this second term doesn't really affect the cost, so $hat{w_1}$ is free to grow as large as it needs to, to minimise the other component of the cost function.

### Similar Posts:

- Solved – Showing that ridge regression is a solution to the following optimization problem
- Solved – Showing that ridge regression is a solution to the following optimization problem
- Solved – Computing variance from moment generating function of exponential distribution
- Solved – Lasso regression solutions
- Solved – Lasso regression solutions