# Solved – Why increasing lambda parameter in L2-regularization makes the co-efficient values converge to zero

Why increasing lambda parameter in L2-regularization makes the co-efficient values converge to zero?

I have just tried to do the math, but it's a little bit rusted.

Lets say that we have a simple linear model as follows:
$$y=w_1cdot x$$

we could write the cost function for ridge regression is to be minimized:

$$cost(hat{w_1}, lambda)= (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$$

it means that if we consider the problem as min-max:

$$frac{hat{dw_1}}{dc} = -2 cdot x cdot (y – hat{w_1}) + 2cdot lambda cdot hat{w_1} = 0$$ so,

$$y = (1 + frac{lambda}{x}) cdot hat{w_1}$$

Since the y and x are invariants, it is to be expected increasing $$lambda$$ make the co-efficient decrease as the equation holds.

Is that the right way to reason?

Contents

Yep, that is one way to think about it, although it seems a tad obscure to me.

I think it's simpler to just look at your $$text{cost}$$ equation:

$$text{cost}(hat{w_1}, lambda) = (y – hat{w_1} cdot x)^2 + lambda cdot hat{w_1}^2$$

We can see from this that, for large $$lambda$$, our cost increases quadratically with the absolute size of $$hat{w_1}$$. That is, we are penalising our model for having a large weight: thus to reduce the cost, our $$hat{w_1}$$ coefficient is shrunk towards zero.

If $$lambda$$ is small, or zero, this second term doesn't really affect the cost, so $$hat{w_1}$$ is free to grow as large as it needs to, to minimise the other component of the cost function.

Rate this post