Solved – cv.glmnet – choose lambda to include specific number of variables

I am running LASSO regression selection models using cv.glmnet(). Predicted is the incidence of a disease and I have 63 coviarates to include.

Of these 63 covariates, I force three to be included in the model by setting the penalty factor to 0.
The results look good and always include the three penalized variables plus a few of the remaining 60 covariates.
A colleague now suggested that I run the model as I did but also choose lambda.min so that the solution always includes 5 additional covariates (plus the three penalized ones).

How do I do that?
He suggested telling the model to select something like

min(lambda[which(n.var==x)]) 

But I don't know how to build it into the model.

Here is what I've got:

cvfit = cv.glmnet(x, y, family="cox", penalty.factor=pen)  coef.min = coef(cvfit, s = "lambda.min",penalty.factor=pen) 

I assume that in order for the model to include my penalized covariates plus extacly 5 out of the 60 remaining, I have to set the "lambda=" argument in cv.glmnet()?
Can anyone please help?

I don't think that your colleague had anything fancy in mind — fit a glmnet model with cross-validation as you ordinarily would and then examine how many nonzero features you have at each value of $lambda$. When you have 5 (or however many) nonzero features, that's the value of $lambda$ to choose.

glmnet even keeps track of this automatically for you. If lassoFit is your cv.glmnet object, then lassoFit$nzero counts the number of nonzero entries at each value of lambda in the sequence. They occur in the order of the lambda sequence.

Similar Posts:

Rate this post

Leave a Comment