Okay so I run this model manually and get around 80-90% accuracy:
mlp = MLPClassifier(hidden_layer_sizes=( 50, 50), activation="logistic", max_iter=500) mlp.out_activation_ = "logistic" mlp.fit(X_train, Y_train) predictions = mlp.predict(X_test) print(confusion_matrix(Y_test, predictions)) print(classification_report(Y_test, predictions))
Then, I do some 10-fold cross validation:
print(cross_val_score(mlp, X_test, Y_test, scoring='accuracy', cv=10))
And I get accuracy stats something like the following for each fold:
[0.72527473 0.72222222 0.73333333 0.65555556 0.68888889 0.70786517
0.69662921 0.75280899 0.68539326 0.74157303]
I've done this about 5 times now. Every time I run the model on its own, I get 80-90% accuracy, but then when I run cross-validation, my model is averaging 10-20% less than when the model is run once manually.
The chances of getting the best model first time, five times in a row are 1 in 161,051 (1/11 ^ 5). So I must just be doing something wrong somewhere.
Why does my model consistently perform worse in cross-validation?
EDIT – I'd like to add that I'm doing exactly the same thing with a RandomForestClassifier()
and getting expected results, i.e. the accuracy obtained when I run the model manually is around the same as when run by the cross_val_score()
function. So what is it about my MLPClassifier()
that's producing this mismatch in accuracy?
Best Answer
I think there is some confusion as to the basis of what is being observed here. First, a model is trained against the X_train/Y_train dataset. When testing this model against the X_test/Y_test (holdout) dataset, an accuracy of 80-90% is observed. Next, a cross-validation was run. This outputs a fold score based on the X_train/Y_train dataset.
The question asked was why the score of the holdout X_test/Y_test is different than the 10-fold scores of the training set X_train/Y_train. I believe the issue is that based on the code given in the question, the metrics are being obtained on different datasets. The 80-90% score comes from running mlp.predict()
against the test dataset, while the 60-70% accuracy comes from obtaining fold scores for the train dataset.
Similar Posts:
- Solved – Why does the model consistently perform worse in cross-validation
- Solved – Why does the model consistently perform worse in cross-validation
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%