Solved – the difference between bias and residuals

I'm aware of the bias variance trade off.
Intuitively I understand how as the model becomes more complex the variance decreases and the bias increases, after a certain point.
But I don't really understand bias.

For example:

If we have a predictor variable x, and we want to estimate a y.

Bias = E[x] – y

residual = x*B – y <=> E[x] – y

A bias is a property of an estimator or a statistics, NOT of a stochastic realization. It means that an estimator or a statistics is calculated in a way that it is SYSTEMATICALLY different from the quantity that is supposed to summarize / estimate.

These things are NOT examples of bias:

  • Residuals for a single experiment
  • The difference of a parameter estimate or prediction from the truth for a single experiment (unless it is systematic)
  • Anything else that is stochastic and not systematic

The bias variance trade-off is maybe not an ideal name, it should maybe have better been called interpolation/extrapolation trade-off. Anyway, the motivation for the name is that that when adding more parameters / complexity, you have

  • Less systematic error (bias) in your model (supposedly, because it is more flexible, I would argue it depends on what you call error / bias)
  • More variance in the estimation of the model parameters (because it is more flexible)

Similar Posts:

Rate this post

Leave a Comment