Im facing a problem with the results of a multi-class random forest model.
I want to use a) the predictions of the model and b) the class probabilities of these predictions for further work.
I did a cross-validation, grouped by a variable I dismissed right after, and trained a multiclass model, using the following code:
folds5 <- groupKFold(feature_data$hh_id, k = 5) #remove group variable feature_data <- feature_data[, ! names(feature_data) == "hh_id"] fitControl <- trainControl(method = "cv", number = 5, index = folds5, sampling = "down", savePred=T) set.seed(1) rf_mod <- train(class~.,feature_data, method = "rf", norm.votes=T, #predict.all=FALSE, type = "Classification", metric= "Accuracy", ntree = 500, trControl = fitControl)
my results is an accuracy of approx 40%, which is reasonable for that case. this is the confusion matrix:
Confusion Matrix and Statistics Reference Prediction 1 2 3 4 5 1 245 399 61 57 37 2 171 962 162 206 91 3 50 456 131 130 51 4 36 352 95 395 167 5 67 182 42 263 152 Overall Statistics Accuracy : 0.38
My first thoughts to continue was to use the function predict(..., type = "prob")
to get the probabilities.
This leads to accuracy going up to 80%. I suppose that these results are wrong, because the data was also used for learning.
predict_rf_model <- predict(rf_mod) caret::confusionMatrix(predict_rf_model , feature_data$class) Reference Prediction 1 2 3 4 5 1 558 190 0 13 0 2 8 1658 0 45 0 3 1 221 491 54 2 4 1 185 0 886 1 5 1 97 0 53 495 Overall Statistics Accuracy : 0.8242 95% CI : (0.8133, 0.8347)
This means I cannot use predict() to get the class probabilites
I was trying to find fields inside my model rf_mod
. And I found some promising fields:
rf_mod$pred
saves the predictions of all test samples, if you set safePred in TrainControl. By that I get all predicted classes, which is nicethere is a field
rf_mod$finalModel$votes
which saves the class probabilities( 5 Classes) :
> rf_mod$finalModel$votes 1 2 3 4 5 1 0.521505376 0.021505376 0.010752688 0.064516129 0.381720430 2 0.865979381 0.072164948 0.020618557 0.005154639 0.036082474 3 0.873626374 0.054945055 0.038461538 0.016483516 0.016483516 ...
- I first thought this is what I need, but finalModel has the same or a similar confusion matrix as the predict function() with falsified(?) results.
Where can I get the classifier probability like in rf_mod$finalModel$votes
?
There might be another parameter to get the probabilites that I am too dumb to figure out.
Any other solution to get class probabilities with grouped cross validation is also appreciated.
For your interest, I want to combine the classifier results in the next step, by hh_id. An information about the probability could improve the results.
Thank you in advance!
Best Answer
In addition to savePredictions
, you should set classProbs=TRUE
.
https://rdrr.io/cran/caret/man/trainControl.html
https://stackoverflow.com/q/36750272/10495893
Similar Posts:
- Solved – Multi- Class probabilities of Random Forest inside caret Model
- Solved – Multi- Class probabilities of Random Forest inside caret Model
- Solved – Probabilities of classes using h2o.predict
- Solved – Why does randomForest confusion matrix not match the one I calculate using predictions from the model object
- Solved – Why does randomForest confusion matrix not match the one I calculate using predictions from the model object