How does getting a smaller feature space result in combating high variance in a machine learning algorithm? What I currently understand is the following:
With fewer features the hypothesis can easier fit the data and therefore it helps against high bias (underfitting).
With more features the hypothesis can harder fit the data and therefore it helps against high variance (overfitting).
Please point out to me what is wrong with my thought process.
Best Answer
You have it backwards. I'm not sure what your reasoning is for your connections (you just stated a belief, not the reasons for that those beliefs), but here are the correct chains of reasoning.
When you expand the number of features under consideration, you create more options for the final, fit model. When there are more options, you can use your greater flexibility to find a result that is closer to recreating the data as is. The more you recreate the data as is (i.e. the "harder" you fit the data), the more sensitive your final model is to the data being a bit different. This sensitivity to the data being a bit different is called variance, so more features means more variance. More variance is almost by definition more susceptibility to overfitting.
Less variables makes it more likely you left out an important (or even marginally beneficial) predictor. This means you are less likely to capture the truth accurately, so more bias.
More features => More varaince => Easier to overfit. Less features => More bias => Easier to underfit.
Similar Posts:
- Solved – Do Neural Networks suffer from high bias or high variance
- Solved – the difference between (bias variance) and (underfitting overfitting)
- Solved – the difference between (bias variance) and (underfitting overfitting)
- Solved – Relationship between bias, variance, and regularization
- Solved – Confusion related to the bagging technique