I am optimizing the XGBoost. I would like to know if I need to tune L1 or L2. They seem to be useful in linear models to avoid overfitting, but I am working with trees (so I take care of max depth to avoid overfitting).
My dataset is multiclass.
It is hard to answer your question with the detail you provided. As you mentioned, in general, L1 or L2 regularization will help to control overfitting.
However, in gradient boosting with trees, there are too many parameters contribute to model complexity. For example, reducing number of iterations, changing
eta parameter, will also help to control over fitting. In addition, as you mentioned, we can also control tree depth etc. With so many parameters contribute to model complexity, most people will just search the best in a representative testing data set, instead of investigating how exactly one parameter affect model.
My suggestion would be
- Investigate the current model is over fitting or under fitting.
- If the model is overfitting, L1 and L2 regularization can be helpful, if you have time try a grid search on it.
- Solved – Why is the penalty term added instead of subtracting it from loss term in regularization
- Solved – Random Forest pruning vs stopping criteria
- Solved – If I use a regularization (e.g. L2) can I not apply early stopping
- Solved – maximum tree depth vs number of training data in random forest regression
- Solved – theoretical basis for the shrinkage used in Boosted Regression Trees