I have a large set of data and a couple of regressors that seem to be somewhat to highly correlated. I will include these in a GLM and am primarily interested in the predictive ability of the model and not inference on individual parameter estimates.
How will the predictive ability of the model be influenced by the multicollinearity?
Might if get outlying predictions?
Best Answer
In the context of the Normal GLM, multicollinearity isn't always such a problem for prediction. Often it can mean that although individual coefficients can't be estimated efficiently, the linear combination of them (i.e. the fitted values $Xboldsymbol{hat{beta}}$) can still be. This tends to be the case when tests on the model as a whole suggest it is a good fit to the data, but $t$-statistics and the like suggest that individual coefficients aren't significant.
The model can still be good for prediction provided that the covariates for the response ($y$) you are trying to predict are similarly correlated with each other as those used to fit the model.
Sections 10.8 and 10.9 of Basic Econometrics by Gujarati (2003) for more detail and guidance here, as well as some good examples for Economic data.
Similar Posts:
- Solved – How seriously should I consider the effects of multicollinearity in the regression model
- Solved – How seriously should I consider the effects of multicollinearity in the regression model
- Solved – High correlation between two independent variables, but no multicollinearity
- Solved – Multicollinearity in OLS
- Solved – multicollinearity high R squared