I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!

I am using caret to train a ridge regression:

`library(ISLR) Hitters = na.omit(Hitters) x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column. y = Hitters$Salary set.seed(0) train = sample(1:nrow(x), 7*nrow(x)/10) library(caret) set.seed(0) train_control = trainControl(method = 'cv', number = 10) grid = 10 ^ seq(5, -2, length = 100) tune.grid = expand.grid(lambda = grid, alpha = 0) ridge.caret = train(x[train, ], y[train], method = 'glmnet', trControl = train_control, tuneGrid = tune.grid) ridge.caret$bestTune # alpha is 0 and best lambda is 242.0128 `

Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:

`ridge_full <- train(x, y, method = 'glmnet', trControl = trainControl(method = 'none'), tuneGrid = expand.grid( lambda = ridge.caret$bestTune$lambda, alpha = 0) ) coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda) `

Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package – and extract coefficients:

`library(glmnet) ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda) coef(ridge_full2) `

**Contents**hide

#### Best Answer

It seems like a bug in `caret`

's implementation.

First some notes about glmnet package:

- The documentation of
`glmnet()`

recommends against giving one single value for`lambda`

. It is preferable to warm start with large values values of`lambda`

. `predict.glmnet()`

lets you override the value(s) of`lambda`

which was used to train the model (cf`s`

argument). However, when some supplied values of`s`

differ from the original`lambda`

, the default behaviour is to use linear interpolation rather than re-fit a model with`lambda=s`

. This can be controlled with the`exact`

argument of predict().

Caret provides a `loop()`

function for its glmnet wrapper (see https://github.com/topepo/caret/blob/master/models/files/glmnet.R). This loop function will only fit one model per value of alpha, with the max corresponding value of lambda. The other lambda values are considered as "submodels". Their evaluation is deferred to `predict()`

rather than `fit()`

, using the `glmnet::predict(s=...)`

feature mentioned above. This is efficient and fine but predict should specify `exact=TRUE`

to obtain the same results as glmnet.

With this setting, I don't think that `ridge.caret$finalModel`

was effectively trained with the optimal lambda.

### Similar Posts:

- Solved – r: coefficients from glmnet and caret are different for the same lambda
- Solved – r: coefficients from glmnet and caret are different for the same lambda
- Solved – Can the alpha, lambda values of a glmnet object output determine whether ridge or Lasso
- Solved – LASSO and Dimension Reduction
- Solved – LASSO and Dimension Reduction