Solved – Multiple Regression – Testing for multicollinearity

Say I have a regression model that looks as follows. The goal is to predict credit card balance given a number of independent variables.

enter image description here

This is just the first pass at the model and no attempt as yet been made to optimize it. I'm curious when the best time is to do a multicollinearity test. Is it now before we go any further or should it occur after we've narrowed down to what we think will be our final independent variables?

I don't think it matters much. Checking it later will save you unnecessary work and agitation at needless transformations that might prove pointless if the variables won't be in the final model. That being said, checking vif(model) is not time consuming, and you can always wait with the application of solutions to potential multicollinearity until later.

The problem of multicollinearity is that it can distort the affected coefficients, change their signs and their significance. The 'good' thing (should say convenient) about multicollinearity it is that it affects only the collinear variables – yet does not affect he rest of the variables. This means that if collinear it's only exists on control variables, it often OK to disregard it.

You can check, see if it's on controls. If it is, optimize and leave it. If on main explanatory variables, deal now before optimizing (a common way is centering – which can be done using scale(var_to_scale, scale = FALSE)

Edit: the answer by @user3640761 rises a valid suggestion, that you check for high correlations in your data before doing anything else. It's easy, fast, and can give a good indication.

Similar Posts:

Rate this post

Leave a Comment