Solved – Why do we use random sample with replacement while implementing random forest

Let's say we want to build random forest. Wikipedia says that we use random sample with replacement to do bagging. I don't understand why we can't use random sample without replacement.

Random forests are based on the concept of bootstrap aggregation (aka bagging). This is a theoretical foundation that shows that sampling with replacement and then building an ensemble reduces the variance of the forest without increasing the bias.

The same theoretical property is not true if you sample without replacement, because sampling without a replacement would lead to pretty high variance.

Let say we’re building a random forest with 1,000 trees, and our training set is 2,000 examples. If we sample without replacement we would train on 2 examples per tree. This is obviously impractical.

Hope this helps.

Similar Posts:

Rate this post

Leave a Comment