# Solved – How to obtain regularization parameter when pruning decision trees

I'm having trouble understanding exactly how to obtain the regularization parameter when pruning a decision tree with the minimal cost complexity approach. Assume the cost complexity function is represented as

\$\$C(T) = R(T) + alpha|T|,\$\$

where \$alpha\$ is the regularization parameter to be chosen.

Utilizing the entire data set, We now use weakest link cutting to obtain a set of \$alpha\$'s and the corresponding sub-trees which minimize the cost for a given \$alpha\$.

Next, we generally use a K-fold cross-validation. This is where the pruning approach becomes unclear to me. For K-fold CV we estimate K trees. Next, I would think we would use the original \$alpha\$'s which we obtain from the entire sample to identify the sequence of optimal sub-trees in each fold. We would then proceed with CV, selecting the \$alpha\$ with corresponding smallest average error.

However, several sources (These lecture notes and Intro to Stats Learning Pg 309) seem to suggest that within each fold a new set of \$alpha\$'s are obtained. Let's refer to the set of \$alpha\$'s obtained within the kth fold as \$alpha^{(k)}\$. This does not make sense to me. It is not likely that each entry within the set \$alpha\$ (i.e. the set of \$alpha\$'s obtained from the entire data set) will be equivalent to \$alpha^{(k)}\$ of even that the elements of \$alpha^{(k)}\$ will be equivalent to \$alpha^{(j)}\$. How can we pick the entry of \$alpha\$ that minimize cost when \$alpha^{(k)}\$ potentially share no similar entries with \$alpha\$?

Contents