# Solved – Ridge Regression: how to show squared bias increases as \$lambda\$ increases

I have a Ridge regression model to estimate the coefficients of the true model $$y = Xbeta + epsilon$$. I have the standard model where $$mathbb{E}[epsilon] = 0, mathrm{Var}(epsilon) = I.$$ The ridge estimator of $$beta$$ is: $$beta^mathrm{Ridge} = (X^top X + lambda I )^{-1} X^top y$$

Assume we have a fixed testing point $$x_0$$. I have proved that by increasing $$lambda$$ the variance of estimation $$hat{f}(x_0) = x_0^top (X^top X + lambda I)^{-1} X^top y$$
is decreasing.

Now I want to show that by increasing $$lambda$$ the squared bias of the test estimation steadily increase.

I thought of using the bias-variance tradeoff, but it does not work since the tradeoff tells us
$$Error(x_0) = text{Irreducible Error} + mathrm{Bias}^2(hat{f}(x_0)) +mathrm{Variance}(hat{f}(x_0)) .$$
To show that increased variance implies decreased bias, we need to have the same $$Error(x_0)$$ but this is not the case.

So, how can I show that the bias of our ridge estimation on the test data steadily increases with increasing $$lambda$$?

Contents

I do not know if you are still interested in this issue. I think it will be useful for your problem to look at the limiting result of the estimator mean squared error (for a penalty parameter approaching infinity).

We can indicate with $$hat{beta}_{r} = (X^top X + lambda I )^{-1} X^top y$$ the ridge estimator and with $$hat{beta} = (X^top X)^{-1} X^top y$$ the OLS estimator (which is unbiased, hence $$E(hat{beta}) = beta$$). Now, if we define $$K = (X^top X + lambda I )^{-1} X^top X$$ we can verify that $$hat{beta}_{r} = K hat{beta}$$ (so $$K$$ transforms the OLS estimator in the ridge one).

Then, keeping in mind the definition of $$K$$, it can be demonstrated that (see e.g. Hoerl and Kennard, 1970):

$$begin{array}{lll} MSE(hat{beta}_{r}) &= E[(hat{beta}_{r} – beta)^top (hat{beta}_{r} – beta)] = mbox{Var}(hat{beta}_{r}) + [mbox{Bias}(hat{beta}_{r})]^2 \ & = sigma^{2}mbox{tr}{K (X^{top} X)^{-1}K^{top}} + beta^{top}(K – I)^{top}(K – I)beta \ mbox{Var}(hat{beta}_{r}) &= sigma^{2}mbox{tr}{K (X^{top} X)^{-1}K^{top}} \ [mbox{Bias}(hat{beta}_{r})]^2 &= beta^{top}(K – I)^{top}(K – I)beta. end{array}$$

From above we can compute $$lim_{lambda rightarrowinfty} MSE(hat{beta}_{r}) = beta^top beta\$$

which is the squared bias of an estimator equal to zero (since the variance, as you pointed out, goes to zero for limiting $$lambda$$). I hope this helps a bit (also I hope the notation is correct and clear enough).

Rate this post