when I use the GradientBoostingClassifier from scikit-learn, I find that there is a parameter max_depth
to set, which controls the maximum depth of the regression tree. May I know what exactly does that parameter do? If max_depth=3
, does it mean that the construction of a regression tree will stop growing once the tree exceeds a depth of 3? In this case, this parameter is basically used to control the complexity of the regression tree?
Best Answer
You are right. max_depth
bounds the maximum depth of regression tree for Random Forest constructed using Gradient Boosting. However, default value for this option is rather good.
To see how decision trees constructed using gradient boosting looks like you can use something like this
from sklearn import tree from sklearn.externals.six import StringIO import pydot import numpy as np # generate training sample training_points = np.random.rand(20, 3) training_values = np.sum(training_points, axis=1) > 0.8 * np.random.rand(20,) # get decision tree decision_tree = tree.DecisionTreeClassifier(max_depth=3) model = decision_tree.fit(training_points, training_values) # save tree as pdf dot_data = StringIO() tree.export_graphviz(model, out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("decision_tree.pdf") `