Solved – Why does $l_2$ norm regularization not have a square root

Specifically talking about Ridge Regression's cost function since Ridge Regression is based off of the $l_2$ norm. We should expect the cost function to be:

$$J(theta)=MSE(theta) + alphasqrt{sum_{i=1}^{n}theta_i^2}$$

Actual:

$$J(theta)=MSE(theta) + alphafrac{1}{2}sum_{i=1}^{n}theta_i^2$$

One of the factors to consider is computational simplicity.

By not introducing the square root, the gradient has a more elegant form.

Also, minimizing $MSE$ subject to $|theta|_2le c$ is equivalent to minimizing $MSE$ subject to $|theta|_2^2le c^2$.

Similar Posts:

Rate this post

Leave a Comment