Solved – Interpretation of Logistic Regression Model Using Glmnet()

In the Linear Regression Model, say I want to see the correlation between car_speed and the number of accidents.

That is easy for me to understand. Now for the logic regression model, my response variable is either 0 or 1.

How do I understand this model?

> # Prep Training and Test data. > trainDataIndex <- sample(1:nrow(df), 0.7*nrow(df))  # 70% training    data > trainData <- df[trainDataIndex, ] > testData <- df[-trainDataIndex, ] > set.seed(100) > trainData <-  +   trainData %>% +   dplyr::mutate(CUST_REGION_DESCR =  +                   forcats::fct_relabel(CUST_REGION_DESCR, ~ trimws(.x))) > testData <-  +   testData %>% +   dplyr::mutate(CUST_REGION_DESCR =  +                   forcats::fct_relabel(CUST_REGION_DESCR, ~ trimws(.x))) > str(trainData) 'data.frame':   693843 obs. of  4 variables:  $ cust_prog_level  : Factor w/ 14 levels "B","C","D","E",..: 9 7 10 9 10 9 10 5 10 5 ...  $ CUST_REGION_DESCR: Factor w/ 8 levels "CORPORATE REGION",..: 2 6 7 6 8 8 4 7 7 6 ...  $ Sales            : num  92.7 2356 39 239.6 26 ...  $ New_Product_Type : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ... > str(testData) 'data.frame':   297362 obs. of  4 variables:  $ cust_prog_level  : Factor w/ 14 levels "B","C","D","E",..: 9 5 9 9 9 9 3 3 5 3 ...  $ CUST_REGION_DESCR: Factor w/ 8 levels "CORPORATE REGION",..: 3 3 6 6 7 6 7 2 2 4 ...  $ Sales            : num  150.2 68.5 68.1 72.1 60.1 ...  $ New_Product_Type : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...  > x = model.matrix(New_Product_Type ~.,data=trainData)  > cvfit = cv.glmnet(x, y=as.factor(trainData$New_Product_Type), alpha=1, family="binomial",type.measure = "mse")  > lambda_1se <- cvfit$lambda.1se  > coef(cvfit,s=lambda_1se) 23 x 1 sparse Matrix of class "dgCMatrix"                                                 1 (Intercept)                            0.02946581 (Intercept)                            .          cust_prog_levelC                       0.14012975 cust_prog_levelD                       .          cust_prog_levelE                       0.13339906 cust_prog_levelG                      -0.05325043 cust_prog_levelI                       0.21440592 cust_prog_levelL                       0.26273503 cust_prog_levelM                       .          cust_prog_levelN                       0.26620261 cust_prog_levelP                      -0.05166799 cust_prog_levelR                      -0.33054803 cust_prog_levelS                       .          cust_prog_levelX                       0.57508875 cust_prog_levelZ                       1.20748454 CUST_REGION_DESCRMOUNTAIN WEST REGION -0.20993854 CUST_REGION_DESCRNORTH CENTRAL REGION -0.04035331 CUST_REGION_DESCRNORTH EAST REGION     0.01082858 CUST_REGION_DESCROHIO VALLEY REGION    0.03077584 CUST_REGION_DESCRSOUTH CENTRAL REGION  .          CUST_REGION_DESCRSOUTH EAST REGION     0.10606213 CUST_REGION_DESCRWESTERN REGION       -0.17587036 Sales                                 -0.01223843  > #get test data > x_test <- model.matrix(New_Product_Type~.,data = testData) > #predict New_Product_Type, type=”New_Product_Type” > lasso_prob <- predict(cvfit,newx = x_test,s=lambda_1se,type="response")  > #translate probabilities to predictions > lasso_predict <- rep("0",nrow(testData)) > lasso_predict[lasso_prob>.5] <- "1" > #confusion matrix   > table(pred=lasso_predict,true=testData$New_Product_Type)     true pred      0      1    0 207345  60553    1   9004  20460 >  > #accuracy > mean(lasso_predict==testData$New_Product_Type) [1] 0.7660865 

Specifically, 0 relates to "Nobody buys this product/They buy others" and "1" relates to "They buy the House-Product". There're three predictors for this mode.

How can I interpret the result summary(cvfit)?

The mean() now returns " 0.7660865". What does it imply?

You interpret the coefficient estimates from glmnet the same way you would interpret them as if you ran a regular GLM logistic regression. There are plenty of resources on this site and online for interpreting logistic regression coefficients. The coefficients with . next to them were set to 0 via the lasso penalty (it's lasso because you set alpha = 1).

Your last statement is computing the number of times the prediction equals the label, i.e. the model accuracy, which you can get from the table above it:

(207345+20460)/(207345+20460+60553+9004)=0.7660865 

Similar Posts:

Rate this post

Leave a Comment