# Solved – Motivation behind random forest algorithm steps

The method that I'm familiar with for constructing a random forest is as follows:
(from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm)

To build a tree in the forest we:

1. Bootstrap a sample of size N where N is the size of our training set. Use this bootstrapped sample as the training set for this tree.
2. At each node of the tree randomly select m of our M features. Select the best of these m features to split on. (where m is a parameter of our Random Forest)
3. Grow each tree to largest extent possible — i.e. no pruning.

While this algorithm makes sense at a procedural level and certainly produces good results, I'm not clear what the theoretical motivation is behind the steps 1, 2, and 3. Could someone explain what motivated someone to come up with this procedure and why it works so well?

For example: why do we need to perform step 1? It doesn't seem like we're bootstrapping for its usual purpose of variance-reduction.

Contents