SAS Enterprise Miner nicely creates coded dummy variables for any categorical variables used in a logistic regression model. When it performs a variable selection using stepwise sequential selection in the Regression node, however, if one of the dummy variables is included in the regression model, all of the other dummy variables are then also automatically included, even if they are not found to be predictive of the target.
Here's a snippet of the node Results output after the stepwise selection showing that the dummy variables for some of the levels of the Industry variable are significant in the model, but others are not.
Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -9.2383 1.9222 23.10 <.0001 IMP_REP_Age 1 0.3594 0.0938 14.69 0.0001 IMP_UnionSubs No 1 0.5114 0.1472 12.07 0.0005 Industry Agriculture 1 1.4439 0.1871 59.54 <.0001 Construction 1 1.2982 0.2228 33.97 <.0001 Finance 1 -0.3826 0.2536 2.28 0.1313 IT 1 -0.1355 0.2641 0.26 0.6080 Professional 1 0.3569 0.3469 1.06 0.3037 Public Sector 1 -2.3698 0.3522 45.28 <.0001 Retail 1 -1.3766 0.5483 6.30 0.0120 Occupation Type Casual 1 0.0260 0.2499 0.01 0.9171 Employed 1 -0.9068 0.1828 24.61 <.0001
So, for example, the Industry-Agriculture variable seems predictive of the target, but the Industry-IT variables does not. All seven dummy variables for the seven levels of the Industry variable are included in the final model, however.
It seems to me that in the stepwise selection the dummy variables should be treated as individual variables rather than as a group. Does anyone know why SAS Enterprise Miner does it differently?
Best Answer
Much as I dislike stepwise regression, if you are going to do it, I think EM's behavior is appropriate. 1) Because a set of dummy variables all go together. In your model, you are saying that industry is a predictor, not a particular industry
2) If you dropped the nonsignificant dummy variables, the others would change because either you are then controlling for different things or eliminating some subjects from the sample.
Similar Posts:
- Solved – dumthe variables, interaction with continuous variable, and variable selection
- Solved – step {stats} is too slow. Are there multicore solutions
- Solved – How to deal with omitted dumthe variables in a fixed effect model
- Solved – stepwise selection on Negative Binomial regression model
- Solved – Regression model for pre-post single group design