I know that one if one is trying to perform linear regression, multicollinearity can be an issue because it can "lead to unreliable and unstable estimates of regression coefficients." Suppose for a second that I have the following correlation matrix for potential continuous predictor variables I want to regress against a continuous dependent variable(hopefully the context of the problem doesn't matter for this questions).
x <- c(1,.05, .06, .74) y <- c(0.05, 1, .08, 0.07) z <- c(0.06,.08,1,.03) a <- c(.74, .07, .03, 1) correlation_matrix <- data.frame(x,y,z,a, row.names=c("x","y","z","a")) x y z a x 1.00 0.05 0.06 0.74 y 0.05 1.00 0.08 0.07 z 0.06 0.08 1.00 0.03 a 0.74 0.07 0.03 1.00
cor(a,x) some of the variables are correlated but in the range 0.01-0.10, which to me seems like a low level of correlation. My question is if predictor variables are weakly correlated, such as in this case, is it also true that the severity of the unreliability and instability of the coefficient estimates will be low? Does the presence of cor(a,x) change the answer to this?
Lastly, if I have categorical variables, should I be looking at the correlation between categorical variables and continuous variables to investigate whether multicollinearity will be an issue in my analysis? I am curious about all of this all in the context of linear regression. Thanks!
There can be collinearity even with low correlation among all the variables. Suppose there were 10 IVs. 9 of them are completely uncorrelated. The 10th is the sum of the other 9. If you run a correlation matrix, all correlations will be low but there will be perfect colinearity.