I am running LASSO regression selection models using cv.glmnet()
. Predicted is the incidence of a disease and I have 63 coviarates to include.
Of these 63 covariates, I force three to be included in the model by setting the penalty factor to 0.
The results look good and always include the three penalized variables plus a few of the remaining 60 covariates.
A colleague now suggested that I run the model as I did but also choose lambda.min so that the solution always includes 5 additional covariates (plus the three penalized ones).
How do I do that?
He suggested telling the model to select something like
min(lambda[which(n.var==x)])
But I don't know how to build it into the model.
Here is what I've got:
cvfit = cv.glmnet(x, y, family="cox", penalty.factor=pen) coef.min = coef(cvfit, s = "lambda.min",penalty.factor=pen)
I assume that in order for the model to include my penalized covariates plus extacly 5 out of the 60 remaining, I have to set the "lambda=" argument in cv.glmnet()
?
Can anyone please help?
Best Answer
I don't think that your colleague had anything fancy in mind — fit a glmnet
model with cross-validation as you ordinarily would and then examine how many nonzero features you have at each value of $lambda$. When you have 5 (or however many) nonzero features, that's the value of $lambda$ to choose.
glmnet
even keeps track of this automatically for you. If lassoFit
is your cv.glmnet
object, then lassoFit$nzero
counts the number of nonzero entries at each value of lambda in the sequence. They occur in the order of the lambda sequence.
Similar Posts:
- Solved – cv.glmnet – choose lambda to include specific number of variables
- Solved – cv.glmnet – choose lambda to include specific number of variables
- Solved – How does glmnet() handle with both penalized and unpenalized covariates
- Solved – Building final model in glmnet after cross validation
- Solved – Which lambda is cv.glmnet solving for