Solved – How to deal with biased dataset for both training and testing data

I am currently working on a classification problem with a highly biased dataset. The dataset is biased for both training and testing data. And I am having trouble dealing with the dataset or modifying the model.

For example, I have 30 classes, 70% of which are class A and B.

And I have tried to expand the dataset to make my model more robust. However, it has worse performance on the test dataset since the test dataset is also biased.

I am using a deep learning model with cross-entropy loss and also tried weighted cross entropy loss. I wonder what else I can try to relieve the impact of the bias.

What did you already try to balance your dataset? Often, basic methods like random over-/undersampling are used, but if this does not help, you might want to try advanced sampling methods. Additionally you need to keep in mind that you should just re-balance your dataset for the training and NOT for the evaluation of your models.

Similar Posts:

Rate this post

Leave a Comment