I am having a bit of confusion. I was reading this paper where it explained that bagging technique greatly reduces variance and only slightly increases bias. I didn't get it how come it reduces variance. I know what variance and bias is. Bias is the inability of the model to learn the data. Variance is something similar to overfitting. I just don't get it how bagging reduces variance.

**Contents**hide

#### Best Answer

Informally, when a model has too high variance it can fit "too well" to the data. That means, that for different data, the parameters of the model found by learning algorithm will be different, or in other words there will be high variance in the learned parameters, depending on the training set.

You can think of it that way: data is sampled from some real-world probability distribution, and model learns parameters depending on the sampled data. Hence there is some conditional probability distribution on learned parameters of the model given data. This distribution has some variance, sometimes to high. But when you average $N$ models with different sets of parameters learned for different training sets, then it's like you have sampled form this conditional probability distribution $N$ times. The average of $N$ samples form a PD has always smaller variance than just one sampling from the same distribution. For the intuition look at the Gaussian PD, with 0 mean and $sigma = 1$ one sample has exactly $0$ mean and variance $1$. But if you sample $N$ times and average the results, the mean of the result of the operation will still be $0$, but the variance will be $frac{1}{N}$.

Please also mind that this is only very informal intuition, and it would be best for you to read on bias/variance from some good reliable source. I recommend Elements of Statistical Learning II : http://www-stat.stanford.edu/~tibs/ElemStatLearn/

You can download the book for free, and there is a whole chapter on bias/variance decomposition.