# Solved – Different number of samples (observations) per class (one vs. all classification)

I have 20 observations labeled for each class.
The number of classes is 5, so I have a total of 100 observations.
I want to classify one class vs. other classes (one vs. all).

For this, I first labeled '1' for one of five classes (= 20 observations), and '2' for the other classes (= 80 observations). After that, I spiltted the whole data into training data set and test data set for each class ('1' or '2'). Since I used 10-fold cross-validation method, 20 observations for class '1' were separated into 18 and 2 for training data set and test data set, respectively (62 training data and 18 test data for class '2'). I trained a LDA classifier for '1' or '2' labeled observations using training data set, and estimated classification accuracy using test data set.

My question is that isn't there any bias problem when using different number of observations to train a classifier. In my case, the difference is quite big (20 vs 80). I am worrying about this issue. If so, please tell me a proper solution (e.g., a specific classifier) and reference paper to overcome this problem.

Contents

It is answered by Bayesian statistics. The training sample signal to background ratio 20/80 is your prior probability. The classifier is your likelihood and doesn't depend on the signal to backround ratio. This is easily understandable: the predicted numbers for the signal samples are the same if you calculate them for the whole prediction sample (with background) or only for the signal subsample. So the a posteriory probability (the predicted signal probability) does only depend on the prior ratio.

In your example, the ratio is the same with or without X-validation, so you are safe.

In addition it would be possible to correct the posterior probability even if you trained with a different ratio using Bayes theorem.

EDIT: Be carefull with the interpretation of the results of the classification. Sensitivity and specificity (otherwise often called purity and efficiency, or type I type II errors) heavily depend on the prior distribution. The classification purity efficiency in your training mixture of signal and backround is meaningless. It only makes sense to compare the classification power with the mixture in your testsample (e.g. compare two classificators with the same testsample)! Imagine your testsample wouldn't contain a single signal sample but of course your training sample did. What would be the Sensitivity and specificity of your classifier.

In addition, the purity efficiency is furthermore a function of the cut value where you classify a sample as signal or background. Depending on the ratio in your testsample you can just choose a different cut-value and get back to the Sensitivity and specificity in your training. Have you tried different values?

Rate this post