In machine learning, why does bagging increase bias? I've read that using less data would lead to a worse estimate of the parameters, but isn't the expected value of the parameter constant regardless of sample size?
In principle bagging is performed to reduce variance of fitted values as it increases the stability of the fitted values. In addition, as a rule of thumb I would say that: "the magnitudes of the bias are roughly the same for the bagged and the original procedure" (Bühlmann & Yu, 2002). That is because bagging allows us to approximate relative complex response surfaces by practically smoothing over the learners' decision boundaries.
That said, you raise a good point about bagging "using less data"; my understanding is that this is a problem when the learners are potentially weak. Having less data makes the learning task more difficult. An obvious example would be an imbalanced dataset where a positive example is rather rare; in that case a simple majority rule for the bagging ensemble will probably be more harmful than helpful as indeed it will be more likely to misclassify the rare class – Berk's "Statistical Learning from a Regression Perspective", Sect. 4.4. on "Some Limitations of Bagging" touches upon this too. Let me note that this deteriorated performance is not totally surprising; bagging or any other procedure is not a silver bullet so it is expected that there will be cases that an otherwise helpful procedure (here bagging) makes things worse.
I think that the Bühlmann & Yu, 2002 paper: "Analyzing bagging" is a canonical reference on the matter if you want to explore further. I also liked the Strobl et al., 2007 paper: "Bias in random forest variable importance measures: Illustrations, sources and a solution", it focuses mostly on variable selection but makes a good point about how bagging affects the bias in that task.