Solved – Why is regression with Gradient Tree Boosting sometimes impacted by normalization (or scaling)

I read that normalization is not required when using gradient tree boosting (see e.g. https://stackoverflow.com/q/43359169/1551810 and https://github.com/dmlc/xgboost/issues/357).

And I think I understand that in principle there is no need for normalization when boosting regression trees.

Nevertheless, using xgboost for regression trees, I see that scaling the target can have a significant impact on the (in-sample) error of the prediction result. What is the reason for this?

Example for the Boston Housing dataset:

import numpy as np import pandas as pd import xgboost as xgb from sklearn.metrics import mean_squared_error from sklearn.datasets import load_boston  boston = load_boston() y = boston['target'] X = boston['data']  scales = pd.Index(np.logspace(-6, 6), name='scale') data = {'reg:linear': [], 'reg:gamma': []} for objective in ['reg:linear', 'reg:gamma']:     for scale in scales:         xgb_model = xgb.XGBRegressor(objective=objective).fit(X, y / scale)         y_predicted = xgb_model.predict(X) * scale         data[objective].append(mean_squared_error(y, y_predicted))  pd.DataFrame(data, index=scales).plot(loglog=True, grid=True).set(ylabel='MSE') 

Dependency of MSE on the scale

A big part of the answer seems to be found in https://github.com/dmlc/xgboost/issues/799#issuecomment-181768076.

By default, base_score is set to 0.5 and this seems a bad choice for regression problems. When the average of the target is much higher or lower than base_score, the first x trees are just trying to catch the average, and less trees are left to solve the real task.

The solution thus seems simple: adjust base_score to the mean of the target to avoid impact from its scale on the regression result.

Especially for objective 'reg:gamma' this indeed seems to be the clue, whereas for 'reg:linear' it provides only a partial improvement:

data = {'reg:linear': [], 'reg:gamma': [], 'reg:linear - base_score': [], 'reg:gamma - base_score': []} for objective in ['reg:linear', 'reg:gamma']:     for scale in scales:         xgb_model = xgb.XGBRegressor(objective=objective).fit(X, y / scale)         y_predicted = xgb_model.predict(X) * scale         data[objective].append(mean_squared_error(y, y_predicted))  for objective in ['reg:linear', 'reg:gamma']:     for scale in scales:         base_score = (y / scale).mean()         xgb_model = xgb.XGBRegressor(objective=objective, base_score=base_score).fit(X, y / scale)         y_predicted = xgb_model.predict(X) * scale         data[objective + ' - base_score'].append(mean_squared_error(y, y_predicted))  styles = ['g-', 'r-', 'g--', 'r--'] pd.DataFrame(data, index=scales).plot(loglog=True, grid=True, style=styles).set(ylabel='MSE') 

Mean Square Error as a function of the scale factor

So the remaining question reduces to: Why is there still sometimes an impact of scaling the target with objective 'reg:linear', even after adjusting base_score to the mean of the (scaled) target?

Similar Posts:

Rate this post

Leave a Comment