Solved – How do important and insignificant variables impact model

I am working to build a model using logistic regression. There is one variable which has strong predictive power. But based on business rule, I cannot include this variable in the model. On the other hand, there are some variables which are insignificant in the model. But from business decision, they are added in the model.

So my question is how the model will be impacted if I remove an important variable? And how will it be impacted if I add some insignificant variables? Is there a way to measure the impact?

At the risk of giving almost worthless advice, the only right answer is: It depends!

You should talk to your client/boss and ask why the business rules mandate including/excluding certain variables. In some cases, there might be a strict legal requirement. For example, in the US, your recruiting system definitely should not consider applicant's race, gender or marital status when deciding which applicants to hire. This could be—and probably are—strong predictors for some industries, but to include them would open your company up to a world of legal hurt. Similarly, other regulations may require that certain factors be considered in certain decisions, even if they're mostly useless.

Other times, the business rules might reflect something more negotiable. You clearly can't use tomorrow's numbers to predict today's sales (causality and all). However, depending on how your company collects and distributes its data, you might not be able to use yesterday's sales to predict today's either. Maybe the sales numbers are only updated twice a week! Similarly, some variables might be costly to measure regularly and your company is trying to control expenses. Depending on the strength of the predictor and the importance of your model, you might be able to lobby for different/additional data collection that would allow you to use these predictors.

Finally, some business rules are just dumb: they're based on bad assumptions, arise from antiquated procedures, or whatever. Perhaps they could be changed if you ask.

Obviously, removing a powerful predictor from your model will make it perform worse; including irrelevant predictors might be more tolerable: hopefully all of their coefficients will be near zero. Alternately, you could slap on some feature selection step which "automatically" prunes them.

I'd try to discern the reasons behind the business rules and offer your boss/client several options.

Similar Posts:

Rate this post

Leave a Comment