I was thinking by using validation but not quite sure how to go with it. Please list some papers or ideas on how. This is for multi class problem (using one vs all approach). I think each class/category has its own speed in converging to a good strong classifier hence would need different number of weak classifiers to avoid overfitting.
You could use a hold-out validation set. Break your training data into two pieces, which we'll call the training and validation sets.
Run AdaBoost on the training set.
After each iteration of AdaBoost where a new weak classifier is generated, run the meta-classifier comprised of all currently created weak classifiers on the validation set.
If the performance on the validation set has dropped relative to the performance in the previous iteration, stop and use the meta-classifier from the previous iteration as your final classifier.
Or, if you're worried about local optima, just stop if it is much worse (whatever much means) than the best classifier you've seen so far.
Importantly, this approach will reduce the amount of training data available to you. There will be a trade off between reducing over-fitting (by taking a large, and thus representative, sample for the validation set), and under-fitting (since if your classifiers have less data to work with, they may not be able to learn all the patterns in the set).
Thus, picking the proper size of validation set is non-trivial. Some important considerations though:
If your data has class imbalance (which it will since you're doing 1 v. all classification), your validation and training sets must be selected in a way which preserves something like the desired class balances – if you put all the minority class exemplars in one of the two sets, things aren't going to work.
If data limitations are an issue, and you are considering up/down sampling the classes, then make sure you partition off the validation set before doing so.