Solved – Minimizing False Negatives with Multinomial Naive Bayes

I currently have a problem where I am trying to classify medical abstracts where some are relevant and some aren't. I have tried an SVM, Multinomial Naive Bayes and Random Forest, and found the MNB works best for the task at hand. The only issue is, my recall for the "relevant" class is slightly lower than it should be. For the datasets I've tested on, my recall is between 0.6-0.85.

Ideally I want the recall for this class to be 0.95+, even if it comes with a decrease in overall classification rate. I would like to minimize the number of False Negatives as much as possible, even if that means the number of False Positives shoot up.

I'm using scikit-learn to implement my models. I tried running a grid search with different alphas and my own scoring function (recall for the class I want) but it just returned the same parameters I was using before.

Any help would be greatly appreciated.

You don't need to change anything in the model itself you just need to change the place where you put your decision threshold.

For example, if the naive-bayes returns the probability of the positive class then by default scikit-learn will put the decision threshold at 0.5, by getting the probabilities out of the classifier instead of the predicted label you can shift this decision threshold to better suit your particular needs.

This may come at the cost of some overall accuracy, the best way to decide the optimal threshold of your particular case is to look at the ROC curve, (here is an implementation using skit-learn) where you plot the True Positive Rate (TPR) against the False Negative Rate (FPR) as you vary the decision threshold (in the case of naive-bayes from 0 to 1). Just pick a point that gets you a recall (same as TPR) above .95 and make sure your not compromising too much FPR (if so look at the ROC curves for other classifiers).

Similar Posts:

Rate this post

Leave a Comment