I am currently working on a classification problem with a highly biased dataset. The dataset is biased for both training and testing data. And I am having trouble dealing with the dataset or modifying the model.
For example, I have 30 classes, 70% of which are class A and B.
And I have tried to expand the dataset to make my model more robust. However, it has worse performance on the test dataset since the test dataset is also biased.
I am using a deep learning model with cross-entropy loss and also tried weighted cross entropy loss. I wonder what else I can try to relieve the impact of the bias.
Best Answer
What did you already try to balance your dataset? Often, basic methods like random over-/undersampling are used, but if this does not help, you might want to try advanced sampling methods. Additionally you need to keep in mind that you should just re-balance your dataset for the training and NOT for the evaluation of your models.
Similar Posts:
- Solved – How to deal with biased dataset for both training and testing data
- Solved – How to handle data imbalance in classification?
- Solved – How to handle data imbalance in classification?
- Solved – Splitting an imbalanced dataset for training and testing
- Solved – Evaluating on training data gives different loss