I am building a Logistic Regression Model using glmnet() package:
> # Prep Training and Test data. > trainDataIndex <- sample(1:nrow(df), 0.7*nrow(df)) # 70% training data > trainData <- df[trainDataIndex, ] > testData <- df[-trainDataIndex, ] > set.seed(100) > trainData <- + trainData %>% + dplyr::mutate(CUST_REGION_DESCR = + forcats::fct_relabel(CUST_REGION_DESCR, ~ trimws(.x))) > testData <- + testData %>% + dplyr::mutate(CUST_REGION_DESCR = + forcats::fct_relabel(CUST_REGION_DESCR, ~ trimws(.x))) > str(trainData) 'data.frame': 693843 obs. of 4 variables: $ cust_prog_level : Factor w/ 14 levels "B","C","D","E",..: 9 7 10 9 10 9 10 5 10 5 ... $ CUST_REGION_DESCR: Factor w/ 8 levels "CORPORATE REGION",..: 2 6 7 6 8 8 4 7 7 6 ... $ Sales : num 92.7 2356 39 239.6 26 ... $ New_Product_Type : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ... > str(testData) 'data.frame': 297362 obs. of 4 variables: $ cust_prog_level : Factor w/ 14 levels "B","C","D","E",..: 9 5 9 9 9 9 3 3 5 3 ... $ CUST_REGION_DESCR: Factor w/ 8 levels "CORPORATE REGION",..: 3 3 6 6 7 6 7 2 2 4 ... $ Sales : num 150.2 68.5 68.1 72.1 60.1 ... $ New_Product_Type : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... > x = model.matrix(New_Product_Type ~.,data=trainData) > cvfit = cv.glmnet(x, y=as.factor(trainData$New_Product_Type), alpha=1, family="binomial",type.measure = "mse") > lambda_1se <- cvfit$lambda.1se > coef(cvfit,s=lambda_1se) 23 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 0.02946581 (Intercept) . cust_prog_levelC 0.14012975 cust_prog_levelD . cust_prog_levelE 0.13339906 cust_prog_levelG -0.05325043 cust_prog_levelI 0.21440592 cust_prog_levelL 0.26273503 cust_prog_levelM . cust_prog_levelN 0.26620261 cust_prog_levelP -0.05166799 cust_prog_levelR -0.33054803 cust_prog_levelS . cust_prog_levelX 0.57508875 cust_prog_levelZ 1.20748454 CUST_REGION_DESCRMOUNTAIN WEST REGION -0.20993854 CUST_REGION_DESCRNORTH CENTRAL REGION -0.04035331 CUST_REGION_DESCRNORTH EAST REGION 0.01082858 CUST_REGION_DESCROHIO VALLEY REGION 0.03077584 CUST_REGION_DESCRSOUTH CENTRAL REGION . CUST_REGION_DESCRSOUTH EAST REGION 0.10606213 CUST_REGION_DESCRWESTERN REGION -0.17587036 Sales -0.01223843 > #get test data > x_test <- model.matrix(New_Product_Type~.,data = testData) > #predict New_Product_Type, type=”New_Product_Type” > lasso_prob <- predict(cvfit,newx = x_test,s=lambda_1se,type="response") > #translate probabilities to predictions > lasso_predict <- rep("neg",nrow(testData)) > lasso_predict[lasso_prob>.5] <- "pos" > #confusion matrix > table(pred=lasso_predict,true=testData$New_Product_Type) true pred 0 1 neg 207840 60865 pos 8697 19960 > #accuracy > lasso_predict[lasso_prob>.8] <- "pos" > #confusion matrix > table(pred=lasso_predict,true=testData$New_Product_Type) true pred 0 1 neg 207840 60865 pos 8697 19960
When I test the accuracy, the return value is 0
> #accuracy > mean(lasso_predict==testData$New_Product_Type) [1] 0
So does it mean my model have ZERO accuracy?
Best Answer
If you look at your data set, your target vector is encoded as zeros an ones
New_Product_Type : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
but when you make your vector of class predictions, you use a completely different encoding
lasso_predict <- rep("neg",nrow(testData)) lasso_predict[lasso_prob>.5] <- "pos"
and then you count how often these vectors are equal
mean(lasso_predict==testData$New_Product_Type)
These two vectors can never be equal, as one contains zeros and ones, and the other contains the strings "pos"
and "neg"
. You need to be much more careful in your programming, and use the same labels to represent the same concept in both vectors.
As a side note, this is probably not a good way to evaluate your model. Unless you have very good reason, you should be wary of using the raw accuracy to make decisions about model fit or predictive power. A quick search of this site will turn up lots of information. For example:
Why is accuracy not the best measure for assessing classification models?
Similar Posts:
- Solved – Interpretation of Logistic Regression Model Using Glmnet()
- Solved – Normalize function for train, test, and validation sets
- Solved – Normalize function for train, test, and validation sets
- Solved – Why does lasso not converge on a penalization parameter
- Solved – Why does lasso not converge on a penalization parameter