In orthogonal design of lasso, we get $hat{beta}_j^{text{lasso}} = 0 text{ if abs}(hat{beta}_j) le lambda /2$. **WHY?**

I've seen the answer and derived it myself, but don't know why.

We begin with definition of lasso,

$$hat{beta}^{text{lasso}} = underset{x} {argmin} sum_{i=1}^{n} ( y_i – sum_{j=1}^{p}beta_{ij}x_{ij} )^2 + lambda sum_{j=1}^{p} |beta_j| $$

In orthogonal design case where $X^T X= I$, $hat{beta} = (X^TX)^{-1}X^{T}y = X^Ty$

begin{align} L(beta, lambda) & = sum_{i=1}^{n} ( y_i – sum_{j=1}^{p}beta_{ij}x_{ij} )^2 + lambda sum_{j=1}^{p} |beta_j| \

& = (Y – X beta)^T(Y – X beta) + lambda mathbf{I}_p text{ abs}(beta) \

& = Y^TY -2hat{beta}^Tbeta + beta^T beta+ lambda mathbf{I}_p text{ abs}(beta) \

& = Y^TY + sum_{j=1}^{p} L_j(beta_j, lambda)

end{align}

where $L_j(beta_j, lambda) = -2 hat{beta}_j beta_j + 2beta^2_j + lambda text{ abs}(beta_j)$.

Leave aside $beta_j=0$, take dereivative w.r.t. $beta_j$ for abs$(beta_j) > 0$,

$$frac{L_j(beta_j, lambda)}{partial beta_j} = -2 hat{beta}_j + 2beta_j + lambda text{ sign}(beta_j)$$

and $hat{beta}^{text{lasso}}$ is either zero or solve,

$$beta_j + lambda text{ sign}(beta_j) / 2 = hat{beta}_j,$$

which is,

$$

hat{beta}^{lasso}_j =

begin{cases}

hat{beta}_j – lambda/2, & text{if } hat{beta}_j > lambda/2\

hat{beta}_j + lambda/2, & text{if } hat{beta}_j < -lambda/2

end{cases}

$$

**My question is the following derivation,**

If abs$(hat{beta}_j) le lambda / 2$, we get

$$L_j(beta, lambda)

= -2 hat{beta}_j beta_j + 2beta^2_j + lambda text{ abs}(beta_j)

ge -lambda text{ abs}(beta_j) + lambda text{ abs}(beta_j)

ge 0 = L_j(0, lambda)$$

and, we can tell $hat{beta}_j^{text{lasso}} = 0 text{ if abs}(hat{beta}_j) le lambda /2$ **(Why? How can you tell?)**

**Why $mathbf{hat{beta}_j^{text{lasso}} = 0}$?** The explanation of $L_j(beta_j, lambda) ge L_j(0, lambda)$ does not seem to justify the reason.

**Contents**hide

#### Best Answer

Your derivation is not really precise, you are not really taking the derivative, but the subderivative, the function $|x|$ is not differentiable when $x = 0$. The subderivative $s$ of the absolute value when $x =0$ is $sin [-1, 1]$

Thus, the conditions you derived are for the case where $hat{beta}^{lasso}_j neq 0$ where indeed the subdifferential of the absolute value is equal the sign. But now consider the case $hat{beta}^{lasso}_j = 0$. By the KKT conditions, this will happen when $-hat{beta}_j^{ols} + sfrac{lambda}{2} = 0$ which implies $|hat{beta}_j^{ols}| leq frac{lambda}{2}$, since $sin [-1, 1]$ when $hat{beta}^{lasso}_j = 0$.

**The LASSO problem**

For the sake of completeness I will write down the the lasso problem here. Our goal is to minimize

$$min_{beta} || Y – Xbeta||_2^2 + lambda||beta||_1$$

where $||cdot||_1$ is the $l_1$ norm. This a convex optimization problem, and the optimum is characterized by the KKT conditions:

$$ -2X'(Y – Xbeta) + lambda s = 0 $$

where $s$ is the subgradient of the $l_1$ norm, that is, $s_j = sign(beta_j)$ if $beta_j neq 0$ and $s_j in [-1, 1]$ if $beta_j = 0$.

In the orthonormal case, $X'Y = hat{beta}^{OLS}$ and $X'X = I$, simplifying this to:

$$ -2hat{beta}^{OLS} +2beta + lambda s = 0 $$

Thus, consider the case where the solution would be $beta_j = 0$. For this to be true we must have that $-2hat{beta}_j^{OLS} + lambda s_j = 0$ which implies $|hat{beta}_j^{OLS}| leq frac{lambda}{2}$, since $s_i in [-1, 1]$. Since this a convex program, KKT is sufficient, and the condition works both ways, that is, $|hat{beta}_j^{OLS}| leq frac{lambda}{2} implies beta_j = 0$