Here the setup:
I have 90% of the data for training and other 10% for testing.
I am doing stratified cross-validation on the 90% Tranining. It is a 10-class dataset. I am using LibSVM for that. When doing 10 fold cross-validation for tuning the Hypermarameters (C in the C-SVM) I get accuracies of 100%. Basically something like this:
Training with 0.03125 - Cross Validation Accuracy = 68.5097% Training with 0.12500 - Cross Validation Accuracy = 98.3% Training with 0.50000 - Cross Validation Accuracy = 100% Training with 2.00000 - Cross Validation Accuracy = 100% Training with 8.00000 - Cross Validation Accuracy = 100%
It is ok to have 100% accuracy in the cross-validation on the TRAINING data? In this case, should I chose C = 0.5 as the best hyper-paramter?
or instead should I move away from parsmeters that ge me 100% in the cross-validation? and why?
if I don't take those with 100%, should I take what 98%? 90%?
Thanks,
Best Answer
I wouldn't say that C>0.5 is necessarily that big. NEVER make any model choices based on the test set, as this would give an optimistically biased performance estimate. The best approach is to use nested cross-validation, where the outer cross-validation is used for performance estimation and the hyper-parameters are tuned independently within each fold using cross-validation. (i.e. if you use 10 fold cross-validation, you perform 10 separate cross-validations to tune the hyper-parameters).
Similar Posts:
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – K nearest neighbors with nested cross validation
- Solved – K nearest neighbors with nested cross validation
- Solved – CV for model parameter tuning AND then model evaluation