Solved – Why use fewer features to fix high variance in a machine learning algorithm

How does getting a smaller feature space result in combating high variance in a machine learning algorithm? What I currently understand is the following:

  1. With fewer features the hypothesis can easier fit the data and therefore it helps against high bias (underfitting).

  2. With more features the hypothesis can harder fit the data and therefore it helps against high variance (overfitting).

Please point out to me what is wrong with my thought process.

You have it backwards. I'm not sure what your reasoning is for your connections (you just stated a belief, not the reasons for that those beliefs), but here are the correct chains of reasoning.

When you expand the number of features under consideration, you create more options for the final, fit model. When there are more options, you can use your greater flexibility to find a result that is closer to recreating the data as is. The more you recreate the data as is (i.e. the "harder" you fit the data), the more sensitive your final model is to the data being a bit different. This sensitivity to the data being a bit different is called variance, so more features means more variance. More variance is almost by definition more susceptibility to overfitting.

Less variables makes it more likely you left out an important (or even marginally beneficial) predictor. This means you are less likely to capture the truth accurately, so more bias.

More features => More varaince => Easier to overfit. Less features => More bias => Easier to underfit. 

Similar Posts:

Rate this post

Leave a Comment