I am regressing a continuous predictor on over 60 variables (both continuous and categorical) using LASSO (glmnet).
In examining the variable trace plot, I notice that as log lambda increases, one of the key variables has a coefficient that actually increases. Then, after a certain point, it begins to decrease like we would expect.
To make sure this wasn't a fluke, I ran 10 models using bootstraps and obtained very similar results.
Is this possible, or is there a problem with the data? If legitimate, what does this trend in the variable's coefficient tell us about the variable and the relation to the response?
Best Answer
It's not only possible, it's a very common occurrence.
Note that the penalty is $ lambda,||beta||_1$. So some components can increase in magnitude as long as others decrease, without increasing the norm overall. Sometimes as $lambda$ increases, one (or a few) coefficient(s) may increase in size at the expense of others which together decrease at least as rapidly, because it helps keep down the rate of increase in the lack of fit term more than reducing them all together would.
You might like to plot what happens to $sum_i |beta_i|$ as $loglambda$ increases.
You'll often see this kind of behaviour when there's some correlation amongst the predictors – there can be a sort of substitution effect.
Note that in your top plot $|beta_4|+|beta_{11}|$ is pretty nearly always decreasing or fairly stable (the occasional small increase will be offset by decreases in the coefficients of still other variables)
Similar Posts:
- Solved – the meaning of the beta for the interaction between continuous variables in a linear mixed-model
- Solved – Time series regression coefficient interpretation with differenced independent variable
- Solved – Intercept increases in regression when adding explanatory variables
- Solved – Interpreting regression results with decimal percentage variables
- Solved – Interpret a multiple linear regression when Y is log transformed