I read that normalization is not required when using gradient tree boosting (see e.g. https://stackoverflow.com/q/43359169/1551810 and https://github.com/dmlc/xgboost/issues/357).

And I think I understand that in principle there is no need for normalization when boosting regression trees.

Nevertheless, using xgboost for regression trees, I see that scaling the target can have a significant impact on the (in-sample) error of the prediction result. What is the reason for this?

Example for the Boston Housing dataset:

`import numpy as np import pandas as pd import xgboost as xgb from sklearn.metrics import mean_squared_error from sklearn.datasets import load_boston boston = load_boston() y = boston['target'] X = boston['data'] scales = pd.Index(np.logspace(-6, 6), name='scale') data = {'reg:linear': [], 'reg:gamma': []} for objective in ['reg:linear', 'reg:gamma']: for scale in scales: xgb_model = xgb.XGBRegressor(objective=objective).fit(X, y / scale) y_predicted = xgb_model.predict(X) * scale data[objective].append(mean_squared_error(y, y_predicted)) pd.DataFrame(data, index=scales).plot(loglog=True, grid=True).set(ylabel='MSE') `

**Contents**hide

#### Best Answer

A big part of the answer seems to be found in https://github.com/dmlc/xgboost/issues/799#issuecomment-181768076.

By default, `base_score`

is set to 0.5 and this seems a bad choice for regression problems. When the average of the target is much higher or lower than `base_score`

, the first x trees are just trying to catch the average, and less trees are left to solve the real task.

The solution thus seems simple: adjust `base_score`

to the mean of the target to avoid impact from its scale on the regression result.

Especially for objective `'reg:gamma'`

this indeed seems to be the clue, whereas for `'reg:linear'`

it provides only a partial improvement:

`data = {'reg:linear': [], 'reg:gamma': [], 'reg:linear - base_score': [], 'reg:gamma - base_score': []} for objective in ['reg:linear', 'reg:gamma']: for scale in scales: xgb_model = xgb.XGBRegressor(objective=objective).fit(X, y / scale) y_predicted = xgb_model.predict(X) * scale data[objective].append(mean_squared_error(y, y_predicted)) for objective in ['reg:linear', 'reg:gamma']: for scale in scales: base_score = (y / scale).mean() xgb_model = xgb.XGBRegressor(objective=objective, base_score=base_score).fit(X, y / scale) y_predicted = xgb_model.predict(X) * scale data[objective + ' - base_score'].append(mean_squared_error(y, y_predicted)) styles = ['g-', 'r-', 'g--', 'r--'] pd.DataFrame(data, index=scales).plot(loglog=True, grid=True, style=styles).set(ylabel='MSE') `

So the remaining question reduces to: Why is there still sometimes an impact of scaling the target with objective 'reg:linear', even after adjusting base_score to the mean of the (scaled) target?

### Similar Posts:

- Solved – getting a cross_val_score of 0
- Solved – Difference in regression coefficients of sklearn’s LinearRegression and XGBRegressor
- Solved – Creating a dataframe includes the cross validation scores
- Solved – XGBoost tree “Value” output:
- Solved – Polynomial regression seems to give different coefficients depending on Python or R