I am doing multi-regression in one of my projects. The input design matrix $X$ has correlated columns. Nevertheless, there are a few 'good' predictors in $X$: if I fit only the good predictors, one-by-one, doing OLS, I can see good prediction result for each individual of them.

However, when I put together all the 20 columns into LASSO, the prediction on Out-of-Sample becomes very bad.

My suspicion is that LASSO cannot handle the multi-colinearity of $X$

In such scenario, what's the best thing to try next?

Any insights?

**Contents**hide

#### Best Answer

LASSO is known to provide unstable solutions in the case of collinear features or in situation where one has more features than observations. LASSO's objective function (and any other objective function) will be unable to find a unique solution when two or more features contain very similar information. An extreme case of this would be having a feature $x_i$ used twice as $x_{i^1}$ and $x_{i^2}$; which of the two variants of the $x_i$ would be included in the final model would be totally arbitrary. For that reason it is recommended to use elastic net regression instead of LASSO. Elastic net regression penalises the $L_1$ norm (like LASSO) as well as the $L_2$ norm (like ridge regression)of the estimated $beta$ coefficients, leading to an objective function of the form: $ min_{beta} { frac{1}{N} ||y -Xbeta||_2^2 + lambda_1||beta||_1 + lambda_2 ||beta|_2^2}$. Importantly, the $L_2$ regularisation can be though as amplifying the variances along the diagonal of the parameters' $beta$ covariance matrix; this helps alleviate (some) collinearity issues both numerically (the condition number of the covariance matrix is lowered) as well as conceptually (the variance of a feature $x_i$ is amplified by at least $lambda_2$ but the cross-covariance of it with other features $x_j$ remains stable).

If you are using R, I would suggest looking into the vignette of the package penalized for an excellent walk-through, I find `panalized`

's implementation of LASSO/elastic-net the cleanest to follow. The original paper on elastic net by Zou and Hastie Regularization and variable selection via the elastic net is also quite readable.

As a final note, please ensure that the features used in LASSO/ridge/elastic-net regression are normalised before being included in a model. As both $lambda_1$ and $lambda_2$ regularise all the features equivalently, regularising features that are registered different scales can result in over- or under-regularising the included features.