I'm trying to model a membership in one of three well-being clusters (flourisher, normative, languisher) based on a set of predictors, using elastic net for both variable selection & modelling. I first use the caret
package in combination with the glmnet
package to do a 10-fold crossvalidation of a multinomial logistic regression to find the optimal values for $alpha$ and $lambda$:
set.seed(123456) elastic_train <- train(cluster_membership ~ ., data = data_multi, method = "glmnet", tuneLength = 25, trControl = trainControl(method = "repeatedcv", search = "random"))
After that, I refit a glmnet
model with the optimal $alpha$ and $lambda$ and pull out the coefficients of the model:
elastic_mod <- glmnet(model.matrix(cluster_membership ~ ., data = data_multi), data_multi$cluster_membership,family = "multinomial", alpha = elastic_net$bestTune$alpha, lambda = elastic_net$bestTune$lambda) coef(elastic_mod, s = 'lambda.min')
My question is, why are there three sets of predictors & how do I interpret them? I understand that in ordinary multinomial regression, one of the outcome categories is used as a reference level & therefore there are $n_{category} – 1$ sets of predictors (e.g. in my case, there would be one set of predictors predicting flourisher vs normative, and another set of predictors predicting languisher vs normative). However, this is not the case in the glmnet
multinomial output – there are clearly three sets of predictors:
$flourisher 24 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -1.097799622 (Intercept) . genderFemale . ethnicAsian -0.209824802 ethnicMaori/Pacific Islander . ethnicOther -0.976265266 age . BMI . BMR_kcal . CRPmgL . md_ndrink . md_refresh 0.262365972 md_sleep . md_dstress -0.135152345 md_dwrkld . md_dcreate -0.028441041 md_dpac 0.007127653 md_dfruit . md_dveg . md_dchips -0.091275151 md_dsweets . md_dsoftdrk -0.069636339 md_dmood_happy 1.514874739 md_dmood_sad -1.167555384 $normative 24 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 1.10289610 (Intercept) . genderFemale 0.11272708 ethnicAsian 0.06757730 ethnicMaori/Pacific Islander . ethnicOther . age . BMI . BMR_kcal . CRPmgL . md_ndrink . md_refresh . md_sleep . md_dstress 0.01914515 md_dwrkld . md_dcreate . md_dpac . md_dfruit . md_dveg . md_dchips . md_dsweets . md_dsoftdrk . md_dmood_happy . md_dmood_sad . $languisher 24 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -0.005096477 (Intercept) . genderFemale . ethnicAsian . ethnicMaori/Pacific Islander . ethnicOther 0.037015029 age . BMI 0.105868542 BMR_kcal . CRPmgL . md_ndrink -0.027819620 md_refresh -0.020768855 md_sleep . md_dstress . md_dwrkld . md_dcreate 0.121300148 md_dpac -0.016322027 md_dfruit . md_dveg . md_dchips . md_dsweets . md_dsoftdrk . md_dmood_happy -0.032887466 md_dmood_sad 1.028937162
How do I interpret the predictors for each category? Do I understand it correctly that there's no concrete reference category?
Best Answer
I emailed kind Dr. Hastie who is the maintainer of the glmnet
package and got the following answer:
In the traditional case, the base category is arbitrary. In fact you can take a fitted model where say category one is the base category, and simply by subtraction of coefficients, make an equivalent model where another is the base (and the fit is identical). (Care must be taken with the standard errors).
Concretely, if category 1 is the base, and you have coefficient vector beta_k for category k , k=2,…,K (with beta_1=0) you can make say category K the base. In this case the new coefficients would be beta’_k = beta_k-beta_K and the fitted probabilities would be unchanged.
With glmnet we chose a symmetric option instead, because we use regularization. With regularization, it would matter and make a difference if you used an asymmetric representation because of the way the shrinking works.
I like the type.multinomial= “grouped” option. In this case a group lasso penalty is applied to the set of coefficients for each feature, and the estimated coefficients average 0.
Again, you can post hoc move to an asymmetric representation as above without changing the fitted model.
Similar Posts:
- Solved – How to find the smallest $lambda$ such that all Lasso / Elastic Net coefficients are zero
- Solved – Elastic net produces complex output with too many non-zero coefficients
- Solved – How to encode an n-level categorical variable as dummies, for glmnet
- Solved – How to encode an n-level categorical variable as dummies, for glmnet
- Solved – How to encode an n-level categorical variable as dummies, for glmnet