Solved – KKT versus unconstrained formulation of lasso regression

L1 penalized regression (aka lasso) is presented in two formulations. Let the two objective functions be
Q_1 = frac{1}{2}||Y – Xbeta||_2^2 \
Q_2 =frac{1}{2}||Y – Xbeta||_2^2 + lambda ||beta||_1.
Then the two different formulations are
text{argmin}_beta ; Q_1
subject to
||beta||_1 leq t,
and, equivalently
text{argmin}_beta ; Q_2.
Using the Karush-Kuhn-Tucker (KKT) conditions, it's easy to see how the stationarity condition for the first formulation is equivalent to taking the gradient of the second formulation and setting it equal to 0. What I can not find, nor figure out, is how the complementary slackness condition for the first formulation, $lambdaleft(||beta||_1 – tright) = 0$, is guaranteed to be fulfilled by the solution to the second formulation.

The two formulations are equivalent in the sense that for every value of $t$ in the first formulation, there exists a value of $lambda$ for the second formulation such that the two formulations have the same minimizer $beta$.

Here's the justification:

Consider the lasso formulation: $$f(beta)=frac{1}{2}||Y – Xbeta||_2^2 + lambda ||beta||_1$$ Let the minimizer be $beta^*$ and let $b=||beta^*||_1$. My claim is that if you set $t=b$ in the first formulation, then the solution of the first formulation will also be $beta^*$. Here's the proof:

Consider the first formulation $$min frac{1}{2}||Y – Xbeta||_2^2 text{ s.t.} ||beta||_1leq b$$ If possible let this second formulation have a solution $hat{beta}$ such that $||hat{beta}||_1<||beta^*||_1=b$ (note the strictly less than sign). Then it is easy to see that $f(hat{beta})<f(beta^*)$ contradicting the fact that $beta^*$ is a solution for the lasso. Thus, the solution to the first formulation is also $beta^*$.

Since $t=b$, the complementary slackness condition is satisfied at the solution point $beta^*$.

So, given a lasso formulation with $lambda$, you construct a constrained formulation using a $t$ equal to the value of the $l_1$ norm of the lasso solution. Conversely, given a constrained formulation with $t$, you find a $lambda$ such that the solution to the lasso will be equal to the solution of the constrained formulation.

(If you know about subgradients, you can find this $lambda$ by solving the equation $X^T(y-Xbeta^*)=lambda z^*$, where $z^* in partial ||beta^*||_1)$

Similar Posts:

Rate this post

Leave a Comment