I use cross-sectional macroeconomics variables with OLS. I found that my data suffers from multicolinearity and i am looking for solutions. I read about first differences of the variables and i tried to do it. However, using first differences the values of the coefficients are totally different and they do not make any sense for interpretating them.

How can I interpret the variables when i take first differences?

Are there any solution without needing to take first differences?

Thanks

**Contents**hide

#### Best Answer

I don't understand what you mean by "first differences of the variables". But as far as I know, a very common way to deal with multicollinearity is Penalized regression, like Ridge regression and Lasso;

Take ridge regression as an example to see how it works. First start with the linear regression model $$ boldsymbol y = mathbf{X}boldsymbol beta + boldsymbol epsilon $$ where $boldsymbol epsilon sim N(mathbf{0}, sigma^2 mathbf{I})$, and $mathbf{X}$ is of full-rank.

The OLS approach is to minimize the residual sum squares (RSS).And the OLS solution is $hat{boldsymbol beta}_{OLS} = (mathbf{X}'mathbf{X})^{-1}mathbf{X'y}$, and $mathrm{Var}(hat{boldsymbol beta}) = sigma^2(mathbf{X}'mathbf{X})^{-1}$.

**When multicolinearity exists**, what happens is $(mathbf{X}'mathbf{X})$ will approach to singular and noninvertible. There exists infinite possible solutions that minimize the RSS (There are theories about generalized inverse that helps explain this).

What ridge regression does is that instead of minimizing RSS, it minimizes the following $$ RSS + lambda sum_{i=1}^p beta_i^2 quad (or quad RSS + lambda |boldsymbol beta |^2) $$

The ridge solution is then $$ hat{boldsymbol beta}_{Ridge} = (mathbf{X}'mathbf{X} + lambda mathbf{I})^{-1}mathbf{X'y} $$ It can be shown that $(mathbf{X}'mathbf{X} + lambda mathbf{I})$ is always invertible. When the design matrix is orthonormal, we have $$ hat{boldsymbol beta}_{Ridge} =frac{1}{1 + lambda} hat{boldsymbol beta}_{OLS} $$

The following simple example shows how ridge regression works when there is collinearity.

`> library(MASS) > set.seed(123) > n <- 100 > x1 <- rnorm(n, mean=10) > x2 <- rnorm(n, x1, sd = 0.01) > y <- 2 + x1 + x2 + rnorm(n) > lm(y ~ x1 + x2) Call: lm(formula = y ~ x1 + x2) Coefficients: (Intercept) x1 x2 3.467 -1.514 3.381 > lm.ridge(y ~ x1 + x2, lambda = 1) x1 x2 3.5693330 0.9143488 0.9420563 `

If we increase the correlation of x1 and x2 (change "sd = 0.001"), the effect of collinearity is more clear.

`> set.seed(123) > n <- 100 > x1 <- rnorm(n, mean=10) > x2 <- rnorm(n, x1, sd = 0.001) > y <- 2 + x1 + x2 + rnorm(n) > lm(y ~ x1 + x2) Call: lm(formula = y ~ x1 + x2) Coefficients: (Intercept) x1 x2 3.467 -22.944 24.811 > lm.ridge(y ~ x1 + x2, lambda = 1) x1 x2 3.5703739 0.9267930 0.9295144 `

Note that the sum of $hat{beta}_{OLS}$ would always be close to 2. (if you simulate x2 by setting the mean being equal to a*x1, say a = 10, you will see some interesting yet similar phenomenon.)

**Some notes**:

- What ridge regression does is that it shrinks the estimates towards zero but cannot exactly be zero unless $lambda$ goes to infinity. And if $lambda$ approaches to 0, the ridge solution converges to OLS solution;
- It's essentially doing bias-variance trade-off, largely reducing variance of estimates by introducing some bias, leading to smaller mean square error. And this is also one of the main underlying rationals of penalized regression.
- It cannot do variable selection in the sense that all variables will have non-zero coefficients. Though some can be very small, but are not exactly zero;
- Lasso regression can do variable selection with L1 penalty.

Another way to deal with multicollinearity is Principle component regression.It basically regresses dependent variable on the principle components of the independent variables obtained by PCA.