# Solved – Multicollinearity, variable selection for cointegration testing in ARDL and VECM/VAR frameworks

I have 15 variables some of which are highly correlated. I want to run a cointegration test in the ARDL and VAR/VECM frameworks. Due to the correlation multicollinearity is a big problem; however, I do not want to omit variables as I want to test all of them.

I am only interested in the long/short run relationship between a variable \$y\$ and \$x_1,x_2,dotsc,x_{14}\$ and not \$x_1\$ on \$y\$, \$x_2,dotsc,x_{14}\$ etc.

Q: Is it possible to divide the variables into groups by, for example, category and correlation to avoid multicollinearity while still getting relevant results?

E.g. ARDL on \$y,x_1,x_2,x_3,x_4,x_5,x_6,x_7\$
and ARDL on \$y,x_8,x_9,x_{10},x_{11},x_{12},x_{13},x_{14}\$

Contents

#### Best Answer

If your variables are cointegrated, they will be highly correlated. That is to be expected and should not be perceived as problematic. (Note also that the sample correlation of the variables in levels will not have a meaningful counterpart in population due to the series being nonstationary in mean.)

Cointegration analysis of subsets of variables in place of using all the variables at once will generally be problematic. For example, if three variables \$x_1\$, \$x_2\$ and \$x_3\$ have only one cointegrating relationship (and thus two stochastic trends), presence of cointegration will not be revealed by examining subsets of variables: neither \$(x_1,x_2)\$ nor \$(x_1,x_3)\$ or \$(x_2,x_3)\$ will be cointegrated. Cointegration will only be revealed by examining the full three-variable system.

However, in some special cases the problem will be less severe. For example, if the same three variables have two cointegrating relationships (and thus only one stochastic trend), you will discover cointegration in all possible pairs of variables (\$(x_1,x_2)\$, \$(x_1,x_3)\$ and \$(x_2,x_3)\$). In such a case, you could obtain two error correction terms from running bivariate regressions, e.g. \$x_1\$ on \$x_2\$ and \$x_1\$ on \$x_3\$ (like the first stage of the Engle-Granger procedure) and then use them in a vector error correction model (VECM) to estimate the coefficients on the error correction terms and on other regressors (lags of first differences of \$x_1\$, \$x_2\$ and \$x_3\$). (When obtaining standard errors, I think you would have to account for the fact that the error correction terms are estimated rather than known precisely.)

If you want to work with subsets, it could make sense to start from checking whether all the variables are cointegrated pairwise. If yes, you have the special case discussed above. If no, you cannot tell whether the whole system is cointegrated or not.

Finally, there could be some hybrid strategies based on the ideas above. The worst case scenario is that there is only one cointegrating vector which you will not find by doing analysis on subsets.

Rate this post