is linear regression/polynomial regression sensitive to irrelevant features/noise
will their respective weights/coefficients be automatically be tuned down?
or is it a straight nail in the coffin?
asking whether or not I can just stack features without thinking about them and just rely on the model to automatically discard useless features, like decision trees for example
Best Answer
The answer is yes, those models are sensitive to noise, but every model is. The strongest signal will be fit first, and gradually lower and lower signal will be fit, until the signal is not longer signal, but noise. This happens with decision trees as well; the strongest variables are at the top of the tree, but at the very end, the split that the tree is making might not be fitting signal but rather noise.
The Noise vs. Signal tradeoff is the Underfitting vs. Overfitting every model faces. You can read more about that by searching for the Bias-Variance tradeoff.
In order to find the good balance of signal vs. noise in Decision Trees, one can set a depth limit. The depth limit determines when you stop, and ideally should be at the point where there is no signal left to fit. In order to find the best depth, you can use Cross-validation.
The same concept exists for linear regression techniques. You can add a regularizer, which usually is the L1 or L2 norm of the vector of parameters your model use. The model will then try to reduce the norm of the vector, effectively using less variables to fit the model. This way, if your model was fitting noise, you could discard unhelpful variables. However, doing this too much could lead to a model not using helpful feature, and finding the best balance between minimizing the error your model makes and minimizing this norm is finding the best tradeoff.
Similar Posts:
- Solved – Is it needed to normalize data before rule model extraction algorithms like ID3
- Solved – Should I select features before using decision tree
- Solved – Can decision trees look multiple levels deep when selecting features to maximize information gain
- Solved – Can decision trees look multiple levels deep when selecting features to maximize information gain
- Solved – Effect of features that are highly correlated with each other on a decision tree