As it well known, we can calculate Relative Information gain (RIG) as follows: $RIG = frac{H(x) – H(x|a)}{H(x)}$.

In binary decision trees we calculate $H(x|a)$ for univariate split for variable $x_i$ as follows: $-(p(x_i>a)H(x|x_i>a) + p(x_i>a)H(x|x_i>a))$. So that means that for each point we look whether this point is from the left or from the right of current decision boundary.

Now, my question is: is it possible to calculate $RIG$ for complex condition in binary tree nodes? Like $(x_i<3);or;(x_i > 7)$? Lets say, that all points for which $x_i < 3$ will be in zone $I$, points that have $x_i$ will be in zone $II$, and every point with $x_i>7$ will be in zone $III$. Using this notation, is it mathematically correctly to calculate relative entropy in this case as follows:

$-(p_{I}H_{I}+p_{II}H_{II}+p_{III}H_{III})$?

Thank you.

**Contents**hide

#### Best Answer

Any criteria which splits the data instances into disjunct regions might be used to compute any of the entropy related criteria. So, the split forms regions, and the criteria is computed on regions. Thus, for computing criteria it does not matter how do you find regions. And anyway, finding regions is a greedy approach, and as a consequence an heuristic.

The idea is that, at least for CART-like trees, this kind of splits are considered useless because if those regions makes sense in the perspective of the criteria used, it will be discovered anyway, by splitting on multiple nodes. Thus, the idea to have a split like $(x<3) or (x>7)$ was abandoned. Another reason is that it is computationally unfeasible.

### Similar Posts:

- Solved – The efficiency of Decision Tree
- Solved – the objective Scikit-learn’s Random Forest classifier is optimizing at each node
- Solved – the objective Scikit-learn’s Random Forest classifier is optimizing at each node
- Solved – Random forest on data having only one feature
- Solved – Decision tree with equal points