Can anyone please explain how splitting is performed in regression trees when we only have continuous features. I have referred to different papers, but all I could find is formulas or theorems.

Can someone please explain, with an example, how we can build a regression tree from scratch?

That would be a great help.

**Contents**hide

#### Best Answer

Tree-based models perform recursive binary splits to optimize some metric, like information gain or Gini impurity. If you have continuous variables, then at each step, the algorithm will look for the variable/cutoff combination that is 'best' according to the metric used. In case of a discrete outcome variable, this relates to the number of correctly classified outcomes. In case of a continuous outcome, then this could for example be the split that reduces the residual variance the most.

If you have a mixture of discrete and continuous variables, then the algorithm works no different:

- Either split a continuous variable at some optimal threshold
- Or split a categorical variable based on the category that results in the largest improvement

If you really want to understand how the tree 'comes to its decision' at each step, you should study the metric used for splitting.

Edit: An example procedure using MSE

- Define a loss function $sum_{i=1}^{k}sum_{j=1}^{n_i}{(hat{y}_j – y_j)^2}$, where $k$ is the current number of nodes (start at $k=1$) and $n_i$ is the number of observations in node $i$;
- Define some regression model. This could be just an intercept, like in André's example: $y = beta_0 + epsilon$, or it could include explanatory variables that you don't want to split, but rather regress on, at the terminal nodes;
- Use an optimizer (e.g. the default in R's
`optim`

) to minimize the loss function in (1) by considering splits among all variables. To do this, you need to obtain all $hat{y}$ values by running your regression model from (2) on each terminal node's observations; - Repeat (3) until some criterium has been reached (e.g. the number of observations in each node is less than can be further split, given the number of parameters in (2));
- You now have a full tree that you can prune.

Your model in (2) can be all kinds of things. For example, R's `party`

package can do simple linear regression, survival analysis, multivariate regression and more. If you want more specific details, try reading the vignette. Section 3.2 explains splitting criteria.

### Similar Posts:

- Solved – multiple predictors in decision tree model
- Solved – Role of n.minobsinnode parameter of GBM in R
- Solved – Understanding MultiClass Categorical Decision Tree Structure
- Solved – Understanding weak learner splitting criterion in gradient boosting decision tree (lightgbm) paper
- Solved – Complexity parameter zero in decision trees