I have read from couple of slides in the internet that if I have two $I(1)$ processes, it’s not a good idea to simply take the differences and include them in a VAR model, as then one might lose relevant information by doing so. However I have hard time to understand why exactly is this. So the question is really if anyone can explain this in more detail by providing a simple example (maybe with two time series) when and how the ”lose of relevant information” happens.

**Contents**hide

#### Best Answer

Think of it this way, when data is I(1), that is interesting. It tell's us something about the underlying process. Further, if you have two I(1) process and they are co-integrated, then this is really interesting. If I first difference, I will remove all this interesting information. For example, if you first difference all data and then run a VAR its simply not going to have as much information as a VECM or a VAR in levels. The VECM, for example will explicitly model the long-run relationship between variables directly. The VAR in levels will also include this long-run relationship, however, it will not be explicitly modeled. This is important information, that you statistical model should know about. Taking first differences is kinda like lying to your statistical model, you telling it that your data is I(0) when in-fact it is not.

The bigger question is what do you plan to use the VAR for?

If you want to perform inference (i.e. hypothesis test), then you're generally going to have to take first difference (this is because the asymptotic properties of variance are weird in the presents of random walks). There is an exception if you want to do some Granger Causality.

If you want to forecast, then there is not need to first difference. Expectations will not be affected by the fact that data follows a random walk.

Read my post about this here.

HTH

**Update**

A mathematical proof of this is going to be tricky (and non-trivial). But if you want evidence I'd suggest you simulate. Here is some simple R code to give you more intuition.

`N = 100 sim = function(coint = TRUE){ #Generate Random Walk n <- 100 x <- rep(n, 100) x[1] <- 1 for(i in 2:n){ x[i] = x[i-1] + rnorm(1) } #Introducing cointegration, if set to FALSE then simple AR(1) model if(coint){ y = 0.5*x+rnorm(n) } else { y = x } d = data.frame('y' = y, 'lx' = lag(x), 'dy' = c(NA, diff(y)), 'dlx' = c(NA,lag(diff(x)))) dTrain = d[1:99, ] dTest = d[100, ] mod1 = lm(y~lx, dTrain) mod2 = lm(dy~dlx, dTrain) #MSE for one-step forecast res1 = sum((predict(mod1, newdata = dTest) - dTest$y)^2, na.rm = TRUE) res2 = sum((predict(mod2, newdata = dTest) - dTest$dy)^2, na.rm = TRUE) c(res1, res2) } res = data.frame(MSE1 = rep(NA, N), MSE2 = rep(NA, N)) #Simulating over 100 different iteration for(i in 1:N){ tmp = sim(TRUE) res[i,'MSE1'] = tmp[1] res[i,'MSE2'] = tmp[2] } mean(res$MSE1) mean(res$MSE2) `

The results for the cointegration case are striking. Less so for the AR(1) model, sometimes difference is better!

Oh and I forgot regression of two I(1) processes that are not cointegrated is dangerous. If you simple regression one process on the other you'll get a spurious regression. To handle this you can either include lags of both processes in the model or take differences.

**Update 2**

Ok, now I understand your question. Thanks for adding more comments. For forecasting purposes there is little difference between a VECM and a VAR in levels when series are I(1) and cointegrated. The former explicitly models the long-run relationship while the latter implicitly models it.

However, as pointed out in VAR forecasting methodology question above, I(1) tests have low power. They often wrongly assert I(1). Moreover, structural break series happen all time in the real world. When these breaks are not model, its also really easy to wrongly assume a I(1). (See Zivot-Andrew test)

With this in mind, why would you ever use a VECM to forecast? Think about it. If you want to use a VECM, you first have to test for random walk, then co integration, filter the data, etc… After doing all this, there is still a non-trivial chance that all the work was for not because the series under considerations are actually I(0). You spend all that time and effort and now you're estimating the wrong model. In contrast, you can skip all the stuff and estimate in levels. At worse you'll get the same performance, at best you'll outperform. That sounds like a sound empirical strategy to me.

For a real-world example, economic theory suggests that commodity prices should follow a stationary process. However, if you naively test corn prices , you'll find a unit root. Why? This paper discusses the details https://www.jstor.org/stable/4492867?seq=1#page_scan_tab_contents

### Similar Posts:

- Solved – VAR model for first differences (not a good idea?)
- Solved – VAR model for first differences (not a good idea?)
- Solved – the difference between cointegration and the VECM
- Solved – Does VECM use the stationary series or the originals ones
- Solved – Interpreting coefficients from a VECM (Vector Error Correction Model)