I recently discovered the RFE tool, and love it. I'd like to understand how this is different from vanilla backward elimination.
Despite lots of information about these two techniques, the penny doesn't seem to drop for me.
Here, the answer intimates that they are essentially the same thing.
Here, the writer suggests that RFE targets individual variable coefficients (I assume p-values or maybe effect size?), whereas Backward Elimination tries to achieve the lowest AIC score for the model as a whole.
Here, the writer suggests that RFE is a type of Backward Elimination, although the explanation is hard to decipher, and the essential difference is not addressed.
So, is RFE just Backward Elimination done by a data scientist, not a statistician?
Quoting Guyon in the paper that introduced RFE:
This [RFE] iterative procedure is an instance of backward feature elimination (Kohavi, 2000 and references therein)
Indeed, when introducing RFE, Guyon does so using Support Vector Machines, and proposes two different methods to rank the single predictors.
At the same time Kohavi tests backward elimination both on tree classifiers and naive bayes – therefore the scoring methods for the features were different.
All and all, the two methods are the same thing – starting from a model with all predictors and removing them one by one based on some scoring function (Z-score for linear regression, Gini for tree based methods, etc.), with the goal of maximizing some target metric (AIC, or test performance).
- Solved – Is it true when we say that Backward is better than Forward and Stepwise Logistic Regression
- Solved – Variable selection for multiple linear regression
- Solved – Controlling for age and sex in a multiple regression with a backward model selection
- Solved – Main Drawbacks of stepwise regression
- Solved – What does an infinite AIC mean and what can be done about it?