I wonder what cause the correlation between parameter estimates in regression analysis?
Does misspecifacation of model cause such a correlation? Or does multicollinearity cause such a correlation?
In the existence of such correlations between parameter estimates, do we need to search the reason and study on the model to decrease such correlations?
I ask this in the sense of simple regression. But in real, I work with a logit model with categorical dependent variables. And in my situation there is strong correlation between dummy varaible's(in the levels of a categorical variable) parameter estimates.
I will be very glad for any help.Thanks a lot.
Best Answer
The source of correlation between parameter estimates is the finite size of the design matrix.
Consider the OLS parameter covariance matrix estimate: $$operatorname{Var}[, hatbeta mid X ,] = sigma^2(X ^T X)^{-1}$$ The design matrix columns are usually correlated, that's normal, there's nothing wrong with it at all.
The formula is for finite sample, which is a very important consideration. Why? Because if you collect infinite number of observations, the diagonals of a term $(X'X)$ will become infinite, making uncertainty completely disappear. When there's no uncertainty, the question of the correlations is moot.
Of course, if the design matrix columns are uncorrelated this will also make parameter estimates uncorrelated, but this is truly a rare situation. Almost all design matrices will have variables correlated.
Similar Posts:
- Solved – Low correlation between predictor variables in linear regression
- Solved – Variance-Covariance matrix interpretation
- Solved – How to find correlation if we have continuous and categorical variables present in the dataset as features and target is again binary
- Solved – interpretation of correlation matrix of regression coefficients in Cox regression
- Solved – Construct a correlation matrix based on pairwise correlation coefficients