how can I find the mean and standard deviation of error rate or accuracy of a k- fold cross validation performing K-nearest-neighbour classification model for each fold?
Best Answer
The mean and standard deviation of you metrics are calculated across results of all cross validation (CV) partitions. So, if you have 10 CV partitions with 10 repeats you will obtain 100 sets of metrics, which in turn are used to compute the mean and standard deviation of each metric. This is not limited to KNN but applies do all models used with CV, therefore this should also answer your other question.
Assuming you are using a software like R: this is computed by the software already, so no need to do this on your own. For the purpose of understanding, here's a minimal working example on how to calculate it by hand anyway:
> library(caret) > m <- train(iris[,1:4], > iris[,5], > method = 'knn', > tuneGrid = expand.grid(k=1), > trControl=trainControl(method='repeatedcv', > number=10, > repeats=10)) > print(m) [...] Resampling results Accuracy Kappa Accuracy SD Kappa SD 0.96 0.94 0.0454 0.0682 > head(m$resample) # performances for individual partitions Accuracy Kappa Resample 1 0.9333333 0.9 Fold01.Rep01 2 1.0000000 1.0 Fold02.Rep01 3 1.0000000 1.0 Fold03.Rep01 4 1.0000000 1.0 Fold04.Rep01 5 0.9333333 0.9 Fold05.Rep01 6 1.0000000 1.0 Fold06.Rep01 [...] > print(apply(m$resample[,1:2], MAR=2, mean)) # calculate mean/sd yourself Accuracy Kappa 0.96 0.94 > print(apply(m$resample[,1:2], MAR=2, sd)) # calculate mean/sd yourself Accuracy Kappa 0.04544332 0.06816498
Similar Posts:
- Solved – Confidence and Prediction Intervals and Cross-Validation
- Solved – How to get Sub-Training and Sub-Test from cross validation in Caret
- Solved – Should logistic regression models generated with and without cross validation in the caret.train function in R be the same
- Solved – Why does the model consistently perform worse in cross-validation
- Solved – Why does the model consistently perform worse in cross-validation