So I'm taking Andrew Ng's course on machine learning (great course, only comment is that its lacking a lot of math) and we came across the analytical solution to a model using Normal equations with the regularization penalty.

Andrew claims that it is possible to prove that the matrix below, inside the parentheses, is always invertible but I'm stuck wondering how to do it.

$$theta = (X^TX + lambda [M])^{-1} X^Ty$$

where $M$ is an $(n+1)(n+1)$ matrix and $lambda$ is the regularization parameter

$$M = begin{bmatrix}

0 & 0 & 0 \

0 & 1 & 0\

0 & 0 & 1

end{bmatrix}$$

**Contents**hide

#### Best Answer

$X^TX$ is either PD or PSD.* If it's PD, it's already invertible. If it's PSD, its smallest eigenvalue is $0$. For any $lambda>0$, you're making the smallest eigenvalue inside the parentheses positive.

*Consider $(Xa)^2$; we can write $||Xa||^2_2ge0implies a^TX^TXage 0$. The LHS is nonnegative, so the RHS must also be. And the RHS is immediately the definition of PSD; $aneq 0$.

As whuber points out, $(X^TX)_{1,1}$ must be positive for this to work. This will be true whenever the first column of $X$ is not a 0 vector. (We can verify that easily: the 1,1 entry is the inner product of the first column of X with itself, which must be nonnegative for real values because it is formed as the sum of squares and squares of reals are nonnegative.)

### Similar Posts:

- Solved – How to prove this regularized matrix is invertible
- Solved – How to prove this regularized matrix is invertible
- Solved – Inverse of the covariance matrix of a multivariate normal distribution
- Solved – spectral clustering/theory, is there any meaning for the magnitude of values in eigenvectors
- Solved – Frisch-Waugh-Lovell theorem: How do we know this matrix is invertible