Solved – Can decision trees look multiple levels deep when selecting features to maximize information gain

Suppose we have two features $f_1$ and $f_2$ that, when examined individually, yield very low or zero information gain relative to competing features. But further suppose that if we were to first split on $f_1$, then $f_2$ would yield high information gain. How can we ensure that the decision tree discovers this split that requires looking two levels deep to realize any information gain?

My concern is that I would not expect the tree to split on $f_1$ or $f_2$ given many competing features with higher individual information gain, and therefore it would fail to discover the optimal $f_1-f_2$ combined split (without overfitting by setting its depth and other parameters suboptimally).

When building a regression model based on decision trees (RandomForestRegressor or GradientBoostingRegressor, for example), does one need to explicitly create derived features out of $f_1$ and $f_2$ to ensure that this information is captured?

Depending on which tree algorithm you're using, usually there is a regularization parameter that defines the cost of splitting: a chosen feature is split if the gain in accuracy exceeds the parameter. By playing with this parameter, you can allow for more splits, and therefore exploring more depth (but still keeping depth constrained). Keep in mind that good tree algorithms will also have a pruning step at the end which will again check whether or not to keep certain branches.

As well, most split choices are done in a pseudorandom way because in general it would be prohibitively expensive to test all possible splits. Alternatively a multi-tree model would quite possibly capture the extra splits you mentioned in a handful of trees if indeed they do increase the accuracy enough, given their current structure.

Similar Posts:

Rate this post

Leave a Comment