Solved – Why does SAS Enterprise Miner keep all dumthe variables for a coded categorical variable in stepwise logistic regression

SAS Enterprise Miner nicely creates coded dummy variables for any categorical variables used in a logistic regression model. When it performs a variable selection using stepwise sequential selection in the Regression node, however, if one of the dummy variables is included in the regression model, all of the other dummy variables are then also automatically included, even if they are not found to be predictive of the target.

Here's a snippet of the node Results output after the stepwise selection showing that the dummy variables for some of the levels of the Industry variable are significant in the model, but others are not.

Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq          Intercept       1     -9.2383      1.9222         23.10        <.0001 IMP_REP_Age     1      0.3594      0.0938         14.69        0.0001 IMP_UnionSubs       No         1      0.5114      0.1472         12.07        0.0005  Industry   Agriculture   1      1.4439      0.1871         59.54        <.0001   Construction  1      1.2982      0.2228         33.97        <.0001   Finance       1     -0.3826      0.2536          2.28        0.1313   IT            1     -0.1355      0.2641          0.26        0.6080   Professional  1      0.3569      0.3469          1.06        0.3037                    Public Sector 1     -2.3698      0.3522         45.28        <.0001                 Retail        1     -1.3766      0.5483          6.30        0.0120                         Occupation Type    Casual       1      0.0260      0.2499          0.01        0.9171      Employed     1     -0.9068      0.1828         24.61        <.0001   

So, for example, the Industry-Agriculture variable seems predictive of the target, but the Industry-IT variables does not. All seven dummy variables for the seven levels of the Industry variable are included in the final model, however.

It seems to me that in the stepwise selection the dummy variables should be treated as individual variables rather than as a group. Does anyone know why SAS Enterprise Miner does it differently?

Much as I dislike stepwise regression, if you are going to do it, I think EM's behavior is appropriate. 1) Because a set of dummy variables all go together. In your model, you are saying that industry is a predictor, not a particular industry

2) If you dropped the nonsignificant dummy variables, the others would change because either you are then controlling for different things or eliminating some subjects from the sample.

Similar Posts:

Rate this post

Leave a Comment