I am reading about the idea of bagging (boostrap aggregating). I have no trouble understanding how the variance of prediction can be reduced. The simple picture is if you have prediction Z_1 through Z_n for each boostrapped sample, the standard error of the mean will be the reduced by sqrt(n).
However, if this picture is true, Mean of boostrapped sample mean should be unbiased estimate of mean of each Z. Then how can the accuracy be improved as well?
Because bagging equalizes influence. This essentially means that the influence of so-called leverage points (points which have a large impact on the overall model) decreases compared to non-bagged models. This is good if the leverage points are bad for the model's performance, which is not always the case. For an example of a leverage point, consider an outlier in least-squares regression.
The variance-reduction argument which most people know about does not explain everything as nicely (e.g. in some cases bagging does increase variance).
- Solved – What are the theoretical guarantees of bagging
- Solved – Where must we use Bagging or Boosting
- Solved – Interpreting case influence statistics (leverage, studentized residuals, and Cook’s distance)
- Solved – the relationship between bagging and XGBoost or Logistic Regression
- Solved – What base classifiers to use with bagging