# Solved – Lucid explanation for “numerical stability of matrix inversion” in ridge regression and its role in reducing overfit

I understand that we can employ regularization in a least squares regression problem as

\$\$boldsymbol{w}^* = operatorname*{argmin}_w left[ (mathbf y-mathbf{Xw})^T(boldsymbol{y}-mathbf{Xw}) + lambda|boldsymbol{w}|^2 right]\$\$

and that this problem has a closed-form solution as:

\$\$hat{boldsymbol{w}} = (boldsymbol{X}^Tboldsymbol{X}+lambdaboldsymbol{I})^{-1}boldsymbol{X}^Tboldsymbol{y}.\$\$

We see that in the 2nd equation, regularization is simply adding \$lambda\$ to the diagonal of \$boldsymbol{X}^Tboldsymbol{X}\$, which is done to improve the numerical stability of matrix inversion.

My current 'crude' understanding of numerical stability is that if a function becomes more 'numerically stable' then its output will be less significantly affected by the noise in its inputs. I am having difficulties relating this concept of improved numerical stability to the bigger picture of how it avoids/reduces the problem of overfitting.

I have tried looking at Wikipedia and a few other university websites, but they don't go deep into explaining why this is so.

Contents