I am running a lasso regression function. I have about 45 features and I am predicting 1 dependent variable. After running lasso regression I get the coefficient values of the features.
If I look at the magnitude of the coefficients do they tell me how important the respective feature was for prediction? for example a feature with a coefficient=100 has more predictive power/importance than one with a value if 20 or 0.
Best Answer
You cannot compare the values of coefficients in this way. Suppose that your response $Y$ is measured in meters, and you have two features $X_1$ and $X_2$ which are measured in seconds and hours respectively. Then your coefficients: $beta_1$ has units meters/second and $beta_2$ has units meters/hour – these are not comparable directly. Even worse is if $X_1$ is measured in seconds but $X_2$ is something totally unrelated, say ohms, coulombs, newtons or lumens.
Now, when doing lasso regression, it is standard practice to standardize the columns in the design matrix, which essentially makes all the predictors dimensionless (though when the coefficients are reported back to the user, they are usually stated on the original scale). You still cannot compare the magnitudes in any reasonable way. A simple way to see this is to consider the following situation:
$$ begin{align*} Y = X_1 + X_2 + epsilon \ corr(X_1, X_2) = 1 end{align*} $$
Any of the following regression models is correct:
$$ begin{align*} E(Y mid X_1, X_2) &= X_1 + X_2 \ E(Y mid X_1, X_2) &= 2 X_1 \ E(Y mid X_1, X_2) &= 2 X_2 \ E(Y mid X_1, X_2) &= .5 X_1 + 1.5 X_2 end{align*} $$
and so on. Of course, situations found "in nature" are never this clear cut, but this illustrates the essential difficulties in your proposal.
Similar Posts:
- Solved – derive importance of feature by its coefficient (multiple linear regression)
- Solved – How to interpret the results when both ridge and lasso separately perform well but produce different coefficients
- Solved – LASSO vs AIC for feature selection with the Cox model
- Solved – How to interpret / metric Lasso regression coefficients
- Solved – What are the disadvantages of using Lasso for feature selection in classification problems?