Solved – XGBoost tree “Value” output:

Using the following R code I obtain a decision tree using the agaricus dataset:

data(agaricus.train, package='xgboost')  bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3,                eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic") # plot all the trees xgb.plot.tree(model = bst) # plot only the first tree and display the node ID: xgb.plot.tree(model = bst, trees = 0, show_node_id = TRUE) 

I want to understand more clearly the "value" output of the tree (the 3rd line in the oval shaped object). Here we can see that tree 0 leaf 7 gives a value 1.90174532. (That is the first terminal node in the image). I want to know if this value is the same as the log-odds score. So, all observations which follow the upper path of the decision tree will obtain a log-odds score of 1.90174532. Then in a new decision tree the observations will fall into a different split depending on each observations characteristics and will obtain a "new" value Then we sum up all these values across all trees to obtain a final log-odds score which can then be converted to a predicted probability using the logistic function.

Is my intuition correct? Does value = log-odds.

( )

enter image description here

The "value" is the contribution of a leaf to the logit. The logit for a sample is the sum of the "value" of all of a sample's leafs. Because XGBoost is an ensemble, a sample will terminate in one leaf for each tree; gradient boosted ensembles sum over the predictions of all trees.

Then the logit can be used in the ordinary way, such as computing the predicted probability of class membership.

More information about gradient boosted trees generally and XGBoost specifically can be found in or .

Similar Posts:

Rate this post

Leave a Comment