I am studying the CatBoost paper https://arxiv.org/pdf/1706.09516.pdf (particularly Function BuildTree in page 16), and noticed that it did not mention regularization.
In particular, split selection is based on minimizing the loss of a new candidate tree, measured by cosine distance between previous iteration gradients and tree outputs. I don't see a "lambda" parameter goes in to penalize new splits.
However, in the CatBoost package there is the parameter of
l2_leaf_reg, which is for "Coefficient at the L2 regularization term of the cost function". How does that parameter work?
The value of the parameter is added to
Leaf denominator for each leaf in all steps. Since it is added to denominator part, the higher
l2_leaf_reg is the lower value the leaf will obtain.
It is quite intuitive though, when you think how L2 Regularization is used in typical linear regression setting.