# Solved – How is the lasso orthogonal design case solution derived

In orthogonal design of lasso, we get \$hat{beta}_j^{text{lasso}} = 0 text{ if abs}(hat{beta}_j) le lambda /2\$. WHY?

I've seen the answer and derived it myself, but don't know why.

We begin with definition of lasso,
\$\$hat{beta}^{text{lasso}} = underset{x} {argmin} sum_{i=1}^{n} ( y_i – sum_{j=1}^{p}beta_{ij}x_{ij} )^2 + lambda sum_{j=1}^{p} |beta_j| \$\$

In orthogonal design case where \$X^T X= I\$, \$hat{beta} = (X^TX)^{-1}X^{T}y = X^Ty\$

begin{align} L(beta, lambda) & = sum_{i=1}^{n} ( y_i – sum_{j=1}^{p}beta_{ij}x_{ij} )^2 + lambda sum_{j=1}^{p} |beta_j| \
& = (Y – X beta)^T(Y – X beta) + lambda mathbf{I}_p text{ abs}(beta) \
& = Y^TY -2hat{beta}^Tbeta + beta^T beta+ lambda mathbf{I}_p text{ abs}(beta) \
& = Y^TY + sum_{j=1}^{p} L_j(beta_j, lambda)
end{align}

where \$L_j(beta_j, lambda) = -2 hat{beta}_j beta_j + 2beta^2_j + lambda text{ abs}(beta_j)\$.

Leave aside \$beta_j=0\$, take dereivative w.r.t. \$beta_j\$ for abs\$(beta_j) > 0\$,
\$\$frac{L_j(beta_j, lambda)}{partial beta_j} = -2 hat{beta}_j + 2beta_j + lambda text{ sign}(beta_j)\$\$

and \$hat{beta}^{text{lasso}}\$ is either zero or solve,
\$\$beta_j + lambda text{ sign}(beta_j) / 2 = hat{beta}_j,\$\$

which is,
\$\$
hat{beta}^{lasso}_j =
begin{cases}
hat{beta}_j – lambda/2, & text{if } hat{beta}_j > lambda/2\
hat{beta}_j + lambda/2, & text{if } hat{beta}_j < -lambda/2
end{cases}
\$\$

My question is the following derivation,

If abs\$(hat{beta}_j) le lambda / 2\$, we get
\$\$L_j(beta, lambda)
= -2 hat{beta}_j beta_j + 2beta^2_j + lambda text{ abs}(beta_j)
ge -lambda text{ abs}(beta_j) + lambda text{ abs}(beta_j)
ge 0 = L_j(0, lambda)\$\$
and, we can tell \$hat{beta}_j^{text{lasso}} = 0 text{ if abs}(hat{beta}_j) le lambda /2\$ (Why? How can you tell?)

Why \$mathbf{hat{beta}_j^{text{lasso}} = 0}\$? The explanation of \$L_j(beta_j, lambda) ge L_j(0, lambda)\$ does not seem to justify the reason.

Contents

Your derivation is not really precise, you are not really taking the derivative, but the subderivative, the function \$|x|\$ is not differentiable when \$x = 0\$. The subderivative \$s\$ of the absolute value when \$x =0\$ is \$sin [-1, 1]\$

Thus, the conditions you derived are for the case where \$hat{beta}^{lasso}_j neq 0\$ where indeed the subdifferential of the absolute value is equal the sign. But now consider the case \$hat{beta}^{lasso}_j = 0\$. By the KKT conditions, this will happen when \$-hat{beta}_j^{ols} + sfrac{lambda}{2} = 0\$ which implies \$|hat{beta}_j^{ols}| leq frac{lambda}{2}\$, since \$sin [-1, 1]\$ when \$hat{beta}^{lasso}_j = 0\$.

The LASSO problem

For the sake of completeness I will write down the the lasso problem here. Our goal is to minimize

\$\$min_{beta} || Y – Xbeta||_2^2 + lambda||beta||_1\$\$

where \$||cdot||_1\$ is the \$l_1\$ norm. This a convex optimization problem, and the optimum is characterized by the KKT conditions:

\$\$ -2X'(Y – Xbeta) + lambda s = 0 \$\$

where \$s\$ is the subgradient of the \$l_1\$ norm, that is, \$s_j = sign(beta_j)\$ if \$beta_j neq 0\$ and \$s_j in [-1, 1]\$ if \$beta_j = 0\$.

In the orthonormal case, \$X'Y = hat{beta}^{OLS}\$ and \$X'X = I\$, simplifying this to:

\$\$ -2hat{beta}^{OLS} +2beta + lambda s = 0 \$\$

Thus, consider the case where the solution would be \$beta_j = 0\$. For this to be true we must have that \$-2hat{beta}_j^{OLS} + lambda s_j = 0\$ which implies \$|hat{beta}_j^{OLS}| leq frac{lambda}{2}\$, since \$s_i in [-1, 1]\$. Since this a convex program, KKT is sufficient, and the condition works both ways, that is, \$|hat{beta}_j^{OLS}| leq frac{lambda}{2} implies beta_j = 0\$

Rate this post