Let's say we want to build random forest. Wikipedia says that we use random sample with replacement to do bagging. I don't understand why we can't use random sample without replacement.
Best Answer
Random forests are based on the concept of bootstrap aggregation (aka bagging). This is a theoretical foundation that shows that sampling with replacement and then building an ensemble reduces the variance of the forest without increasing the bias.
The same theoretical property is not true if you sample without replacement, because sampling without a replacement would lead to pretty high variance.
Let say we’re building a random forest with 1,000 trees, and our training set is 2,000 examples. If we sample without replacement we would train on 2 examples per tree. This is obviously impractical.
Hope this helps.
Similar Posts:
- Solved – combine many gradient boosting trees using bagging technique
- Solved – what is the effect of bootstap resampling in bagging algorithm(ensemble learning)
- Solved – what is the effect of bootstap resampling in bagging algorithm(ensemble learning)
- Solved – setting max_features to none in random forest
- Solved – the underlying reasoning behind sample.fraction or nSamp option in ranger and Rborist respectively