# Solved – Showing that ridge regression is a solution to the following optimization problem

\$\$hat{theta}=argmin_{theta}{ ||y-Xtheta||_2^2+lambda||theta||_2^2},\$\$ where \$X\$ is an \$ntimes p\$ matrix.

We have if \$y=Xtheta+varepsilon\$ then \$\$hat{theta}^{text{ridge}}=(X^TX+lambda I)^{-1}X^Ty\$\$ So I'm kinda confused, because if \$y=Xtheta+varepsilon\$, then \$||y-Xtheta||_2^2+lambda||theta||_2^2=||varepsilon||_2^2+lambda||theta||_2^2.\$ But I'm confused as to how to show that
\$\$(X^TX+lambda I)^{-1}X^Ty=argmin_{theta}{ ||y-Xtheta||_2^2+lambda||theta||_2^2}.\$\$ Any help would be much appreciated. Thank you. I gotta edit this cause someone said it's a duplicate of an entirely different problem cool.

Contents

I think about the problem in summation notation,

The loss is defined, as you said, as \$L = sum_{i=1}^{N}(sum_{j=1}^{M}theta_{j}X_{ij} – y_{i})^{2}+ lambdasum_{j=1}^{M}theta_{j}^{2} \$

You can differentiate this w.r.t \$theta _{k}\$ to find:

\$frac{partial L}{partial theta _{k}} =sum_{i=1}^{N}2(sum_{j=1}^{M}theta_{j}X_{ij}-y_{i})X_{ik} +2lambda theta _{k}\$

Note that \$sum_{i=1}^{N}X_{ik}sum_{j=1}^{M}theta_{j}X_{ij}=sum_{i=1}^{N}X_{ik}(Xcdot theta)_{i}=(X^{T}cdot X cdot theta)_{k}\$

and \$sum_{i=1}^{N}X_{ik}y_{i}=(X^{T}cdot y)_{k}\$

Putting this together:

\$(X^{T}cdot Xcdot theta)_{k}-(X^{T}cdot y)_{k} +lambda theta _{k}=0 hspace{5mm}forall k\$

which you can re-write as a vector equation:

\$X^{T}cdot y= (X^{T}cdot X + lambda I)cdot theta\$

and thus, finally

\$theta = (X^{T}cdot X + lambda I)^{-1}cdot X^{T}cdot y\$

So this has shown that if you assume your loss is given by \$||y – Xcdot theta ||_{2}+lambda ||theta||_{2}\$ and you wish to find the theta which minimises this loss, then \$theta = (X^{T}cdot X + lambda I)^{-1}cdot X^{T}cdot y\$ is the solution. Hope this answers your question

Rate this post