Since I'm relatively new to regularized regressions, I'm concerned with the hughe differences lasso, ridge and elastic nets deliver.

My data set has the following characteristics:

- panel data set: > 900.000 obs. and over 50 variables
- highly unbalances
- 2-5 variables are highly correlated.

To select only a subset of the variables I used penalized logistic regression fitting the model:

$frac{1}{N} sum_{i=1}^{N}L(beta,X,y)-lambda[(1-alpha)||beta||^2_2/2+alpha||beta||_1] $

To determine the optimal $lambda$ I used cross validation which yileds the following results:

The elastic net looks quite similar to the Lasso, also proposing only 2 Variables.

So my main question is: why do these approaches deliver so different results?

According to the Lasso, I only do have 2 variables in the final model and according to the Ridge, I do have 34 variables?

So in the end – which approach is the right one?

And why are the results so extremely different?

Thanks a lot!

**Contents**hide

#### Best Answer

By mean squared error do you mean the Brier score? And for elastic net the plot should be 3-dimensional since there are 2 simultaneous penalty parameters. Don't force $alpha$ to be 0 or 1.

To answer your question, the lasso is spending information trying to be parsimonious, while a quadratic penalty is not trying to select features but is just trying to predict accurately. It is a fools errand to expect that a typical problem will result in a parsimonious model that is highly discriminating. In addition, the lasso is not stable, i.e., if you were to repeat the experiment the list of selected features would vary quite a lot.

For optimum prediction use ridge logistic regression. Elastic net is a nice compromise between that and lasso.

### Similar Posts:

- Solved – Why is Lasso and Ridge not giving better results than OLS
- Solved – Why is Lasso and Ridge not giving better results than OLS
- Solved – For high dimensional data, does it make sense to do feature selection before running elastic net
- Solved – Why Ridge regularization has the grouping effect
- Solved – When to use and when not to use ridge regression