Recently, I built a classification model based on the imbalanced data set(positive sample is minority and negative sample is majority), and the model gave the following result for the test set:
True Positives = 0
True Negatives = 139
False Positives = 0
False Negatives = 10.
My question is: for the result, can Matthews correlation coefficient (MCC ) and F-measure be used for estimating the classifier?
Since the denominators for MCC and F-measure are zero, it seems meaningless. If so, MCC and F-measure is not always works for estimating the classifier, and sensitivity and specificity as well as g-mean should be better. Is that right?
Any help is appreciated.
Best Answer
This is only really a problem if you compute the precision and recall first, then plug them in.
One can also compute the $F_1$ score as $$F_1 = frac{2 cdot textrm{True Positive}}{2 cdot textrm{True Positive} + textrm{False Positive} + textrm{False Negative}}$$
Plugging in your numbers, you'll arrive at an $F_1$score of zero, which seems appropriate since your classifier is just guessing the majority class.
There is an information-theoretic measure called proficiency that might be of interest if you are working on fairly unbalanced data sets. The idea is that you want it to remain sensitive to both classes as either the number of true positives or negatives approaches zero. It's essentially $$ frac{I(textrm{predicted labels}; textrm{actual labels})}{H(textrm{actual labels)}}$$
See pages 5–7 of White et al. (2004) for more details about its calculation and interpretation
Similar Posts:
- Solved – Is dumthe classifier precision always 0.5, even on unbalanced datasets
- Solved – Calculate precision and recall
- Solved – Manual and automated calculation of false positive rate in confusion matrix do not agree
- Solved – Weighting true positives in a loss function
- Solved – Weighting true positives in a loss function