I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!
I am using caret to train a ridge regression:
library(ISLR) Hitters = na.omit(Hitters) x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column. y = Hitters$Salary set.seed(0) train = sample(1:nrow(x), 7*nrow(x)/10) library(caret) set.seed(0) train_control = trainControl(method = 'cv', number = 10) grid = 10 ^ seq(5, -2, length = 100) tune.grid = expand.grid(lambda = grid, alpha = 0) ridge.caret = train(x[train, ], y[train], method = 'glmnet', trControl = train_control, tuneGrid = tune.grid) ridge.caret$bestTune # alpha is 0 and best lambda is 242.0128
Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:
ridge_full <- train(x, y, method = 'glmnet', trControl = trainControl(method = 'none'), tuneGrid = expand.grid( lambda = ridge.caret$bestTune$lambda, alpha = 0) ) coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)
Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package – and extract coefficients:
library(glmnet) ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda) coef(ridge_full2)
Best Answer
It seems like a bug in caret
's implementation.
First some notes about glmnet package:
- The documentation of
glmnet()
recommends against giving one single value forlambda
. It is preferable to warm start with large values values oflambda
. predict.glmnet()
lets you override the value(s) oflambda
which was used to train the model (cfs
argument). However, when some supplied values ofs
differ from the originallambda
, the default behaviour is to use linear interpolation rather than re-fit a model withlambda=s
. This can be controlled with theexact
argument of predict().
Caret provides a loop()
function for its glmnet wrapper (see https://github.com/topepo/caret/blob/master/models/files/glmnet.R). This loop function will only fit one model per value of alpha, with the max corresponding value of lambda. The other lambda values are considered as "submodels". Their evaluation is deferred to predict()
rather than fit()
, using the glmnet::predict(s=...)
feature mentioned above. This is efficient and fine but predict should specify exact=TRUE
to obtain the same results as glmnet.
With this setting, I don't think that ridge.caret$finalModel
was effectively trained with the optimal lambda.
Similar Posts:
- Solved – r: coefficients from glmnet and caret are different for the same lambda
- Solved – r: coefficients from glmnet and caret are different for the same lambda
- Solved – Can the alpha, lambda values of a glmnet object output determine whether ridge or Lasso
- Solved – LASSO and Dimension Reduction
- Solved – LASSO and Dimension Reduction