I'm testing the performance of two binary classifiers on a simulated dataset. I'm seeing that classifier 1 has a higher MSE (mean squared error / classification error) than classifier 2, but classifier 1 has greater area under ROC curve than that of classifier 2. These are seemingly contradictory, because if using MSE for selection then classifier 2 is better, but if using area under ROC then classifier 1 is better. Does anyone have any advice on how to rationalize these results?

Some more info: it seems like classifier 2 is essentially only predicting one class for all of its predictions, whereas classifier 1 is more balanced in its predictions.

**Contents**hide

#### Best Answer

Suppose you're estimating the posterior probabilities of class membership for a binary problem. I'll denote the two classes by Group 1 and Group 2.

Now suppose that we fit a model and on a holdout set we get the following histogram of our predicted probabilities where each point in the histogram corresponds to $hat P(y_i = textrm{Group 1} | x_i)$:

The reddish histogram corresponds to the probabilities for observations that truly belong to group 1, while the blue histogram is for observations that truly belong to group 2. In this case we have perfectly separated the two classes so we'll get an AUC of 1. But if we threshold our probabilities at 1/2 (not necessarily the best thing to do but this is common) we'll misclassify half of this set by labeling everything as Group 2.

Based on what you've described it seems like this sort of thing could be happening to you.

### Similar Posts:

- Solved – Why AUC =1 even classifier has misclassified half of the samples
- Solved – Why AUC =1 even classifier has misclassified half of the samples
- Solved – Can we compare classifier scores in one-vs-all/one-vs-many
- Solved – Can we compare classifier scores in one-vs-all/one-vs-many
- Solved – ROC AUC and PR AUC: Are the AUC values different for each class