I want to compute the following regression using R. `lm(EurOis3~EurepOis3+Vstoxx+log(Open.Market.Operations)+CDS)`

. I am using daily data(i.e. I have 5 observations per week, for several years). EurOis3, EurepOis3, Vstoxx, and CDS are in percentage points, whereas the variable Open.Market.Operations is the Euro amount outstanding of the open market operations from the ECB.

Now first I plotted all the time series using `plot.ts() and acf()`

to see if my series have trend and or unit roots. I also computed the ADF test for each of my series. Since I have 5 observations per week I specified the ADF test accordingly: `ur.df(x, lags=5, selectlags="AIC", type="drift")`

. Does this make sense? For my variables I get the following Values of test-statistic

`EurOis3: -2.2579 EurepOis3: -2.4168 Vstoxx: -3.639 log(Open.Market.Operations): -2.6049 CDS: -1.841 `

The critical values are:

` 1pct 5pct 10pct tau2 -3.43 -2.86 -2.57 `

So it appears that all my variables have a unit- root (at the 5% level) except for Vstoxx (even if I select the option drift and trend). So it appears that this series is already stationary. After yomputing the first difference for each of my variables and then testing again for stationarity, I can reject the hypothesis of non stationarity for each of my variable. So I assume they are all I(1). Here I plotted the ACF (comand: `acf(x)`

for each of my variable, first in level form, then in differenced form:

Now if I look at the acf of Vstoxx there is no sign that Vstoxx should be stationary in level- form. Or am I missing something?

However since most of my series are non stationary I decided to compute my regression with differenced variables, using the following comand: `lm(diff(EurOis3)~diff(EurepOis3)+diff(log(Open.Market.Operations))+diff(CDS))`

Now the problem is that the sign of the estimate log(Open.Market.Operations) changes from negative (which it should be) to positive and becomes insignificant when I compute the regression with differenced variables instead of with level variables. This is counterintuitive and against the excisting literature. So my question is what did I do wrong? What could be possible impacts for the sign change? Is it possible that I overdifferenced the variable log(Open.Markets.Operations)? If I just difference all the other variables except for log(Open.Market.Operations) the result looks better. But I am not sure if this is allowed? Unforetunately I couldnt find any answer to this question so far. Or are there other effects I have overseen? Many thanks for any tips and answers.

P.S. I always used HAC from NeweyWest: `coeftest(reg1, vcov=NeweyWest)`

**Contents**hide

#### Best Answer

Regarding your use of `ur.df`

: setting `lags=5`

may be fine — but I do not think it could be motivated by the fact that you have daily data with 5-day weeks. So the way you do it is perhaps fine, but the motivation is not. Also, think whether `type=drift`

is the most appropriate specification (use subject-matter knowledge).

Regarding differencing of Vstoxx: if Vstoxx is actually stationary, then by differencing you would induce a MA(1) term with coefficient $theta=-1$ which makes the series non-invertible; this is kind of nasty and should be avoided.

*Now if I look at the acf of Vstoxx there is no sign that Vstoxx should be stationary in level- form. Or am I missing something?* Yes, it does look quite persistent. It may have a near-unit root (rather than a unit root), though. So I(0) versus I(1) is not quite clear.

*This is counterintuitive and against the existing literature.* Are you sure the right-hand-side variables are exogenous? If they are not (so that EurOis3 may be influencing one or more of them) you have a problem and the OLS estimates will be ill-behaved (read about simultaneous equation models (SEM); techniques like 2SLS and 3SLS may be used in these kinds of situations). Also check whether the model residuals are well-behaved (no autocorrelation etc.).

*If I just difference all the other variables except for log(Open.Market.Operations) the result looks better. But I am not sure if this is allowed?* Unfortunately, this is not allowed. Having a stationary dependent variable, a bunch of stationary regressors and one integrated regressor does not make sense and is known as *unbalanced regression*; in the long run, you would expect the integrated variable to diverge so it cannot be used as a regressor to explain a stationary variable that is varying around its mean. In other words, the right hand side of the equation would diverge from the left hand side of the equation, and that does not make sense.