Solved – Multi- Class probabilities of Random Forest inside caret Model

Im facing a problem with the results of a multi-class random forest model.

I want to use a) the predictions of the model and b) the class probabilities of these predictions for further work.

I did a cross-validation, grouped by a variable I dismissed right after, and trained a multiclass model, using the following code:

 folds5 <- groupKFold(feature_data$hh_id, k = 5)  #remove group variable feature_data <- feature_data[, ! names(feature_data) == "hh_id"]   fitControl <- trainControl(method = "cv",                            number = 5,                            index = folds5,                            sampling = "down",                            savePred=T)  set.seed(1) rf_mod <- train(class~.,feature_data,                 method = "rf",                 norm.votes=T,                 #predict.all=FALSE,                 type = "Classification",                 metric= "Accuracy",                 ntree = 500,                 trControl = fitControl)  

my results is an accuracy of approx 40%, which is reasonable for that case. this is the confusion matrix:

Confusion Matrix and Statistics            Reference Prediction   1   2   3   4   5          1 245 399  61  57  37          2 171 962 162 206  91          3  50 456 131 130  51          4  36 352  95 395 167          5  67 182  42 263 152  Overall Statistics                 Accuracy : 0.38             

My first thoughts to continue was to use the function predict(..., type = "prob") to get the probabilities.
This leads to accuracy going up to 80%. I suppose that these results are wrong, because the data was also used for learning.

predict_rf_model <- predict(rf_mod)  caret::confusionMatrix(predict_rf_model , feature_data$class)            Reference Prediction    1    2    3    4    5          1  558  190    0   13    0          2    8 1658    0   45    0          3    1  221  491   54    2          4    1  185    0  886    1          5    1   97    0   53  495  Overall Statistics                 Accuracy : 0.8242                            95% CI : (0.8133, 0.8347) 

This means I cannot use predict() to get the class probabilites

I was trying to find fields inside my model rf_mod. And I found some promising fields:

  • rf_mod$pred saves the predictions of all test samples, if you set safePred in TrainControl. By that I get all predicted classes, which is nice

  • there is a field rf_mod$finalModel$votes which saves the class probabilities( 5 Classes) :

> rf_mod$finalModel$votes                1           2           3           4           5 1    0.521505376 0.021505376 0.010752688 0.064516129 0.381720430 2    0.865979381 0.072164948 0.020618557 0.005154639 0.036082474 3    0.873626374 0.054945055 0.038461538 0.016483516 0.016483516 ... 
  • I first thought this is what I need, but finalModel has the same or a similar confusion matrix as the predict function() with falsified(?) results.

Where can I get the classifier probability like in rf_mod$finalModel$votes?
There might be another parameter to get the probabilites that I am too dumb to figure out.

Any other solution to get class probabilities with grouped cross validation is also appreciated.

For your interest, I want to combine the classifier results in the next step, by hh_id. An information about the probability could improve the results.

Thank you in advance!

Similar Posts:

Rate this post

Leave a Comment