It seems that the consensus is that after fitting a linear regression via OLS, when you have included a lagged dependent variable (because it is supported by theory or appears needed based on the data) such as the following AR(1):

$y_{t}=beta_{0}+beta_{1}y_{t-1}+beta_{2}x_{1}+e_{t}$

the resulting coefficients are

1) Always biased

2) If the stationary condition holds ($|beta_{1}|$<1) and there is no serial correlation remaining after the lagged DV is added then the coefficients are consistent.

**Question 1: Is this summary correct?**

**Question 2: If there is no serial correlation in the residuals after adding the lagged (1 or more lags) dependent variable do you typically need a standard error robust to serial correlation like Newey-West?** I would think the answer is no because there is no serial correlation but Wooldridge (page 431 of 3rd edition of introduction) points out that a serial correlation robust standard error "…is valid to use ….in models with lagged dependent variables..". Since I understand that Newey-West standard errors are only appropriate with consistent estimators, this must mean that after the lagged dependent is added, there is no more serial correlation. That makes me wonder why pointing out the ability to use the standard errors with models with lagged DV is relevant?

**Question 3: As a general rule, is it better to just NOT use a lagged DV but instead use OLS to get non-biased estimates and then use a Newey-West or similar procedure for inference?**

**ADD AFTER ALECOS RESPONSE:**

Here is a snippet of the page

**Contents**hide

#### Best Answer

**Question 3)**

In notation to be understood as matrix-vector, assume that the correct specification is $$y = Xbeta + gamma y_{-1}+ e$$ (where $X$ contains the constant and the $X_1$ variable and $e$ is white noise, and $E(emid X) =0$), but you specify and estimate instead

$$y = Xbeta + u$$ i.e. without including the LAD, and so in reality $u =gamma y_{-1}+ e$.

Then OLS estimation will give

$$hat beta = (X'X)^{-1}X'y = (X'X)^{-1}X'(Xbeta + gamma y_{-1}+ e) $$ $$= beta + (X'X)^{-1}X'y_{-1}gamma +(X'X)^{-1}X'e$$

The expected value of the estimator is

$$E(hat beta) = beta + EBig[(X'X)^{-1}X'y_{-1}gammaBig] +EBig[(X'X)^{-1}X'eBig]$$ and using the law of iterated expectations

$$E(hat beta) = beta + EBig(EBig[(X'X)^{-1}X'y_{-1}gammaBig]mid XBig) +EBig(EBig[(X'X)^{-1}X'eBig]mid XBig)$$

$$= beta + EBig((X'X)^{-1}X'EBig[y_{-1}gammamid XBig]Big) +EBig((X'X)^{-1}X'EBig[emid XBig]Big)$$

$$=beta + EBig((X'X)^{-1}X'EBig[y_{-1}gammamid XBig]Big) + 0 $$ the last term being zero per our assumptions. But $EBig[y_{-1}gammamid XBig] neq 0$, because $X$ contains *all* the regressors (from all time periods), and so there is correlation with the LAD vector. Therefore $E(hat beta) neq beta$. In other words, ignoring the lag dependent variable will not make the estimator unbiased, as long as $gamma neq 0$, i.e. as long as the LAD does belong to the regression.

**Question 1)**

Assume now that you specify correctly, and denote $Z$ the matrix containing also the LAD. Here (using the same steps as before)

$$hat beta = beta + (Z'Z)^{-1}Z'e$$

and $$E(hat beta) = beta + EBig((Z'Z)^{-1}Z'EBig[emid ZBig]Big)$$

But is $e$ (the vector) independent of $Z$? No, because $Z$ contains the LAD from all time periods bar the most recent, while $e$ contains the errors from all time periods bar the first. So even if $e$ is not serially correlated, it is correlated with the vector $y_{-1}$. So indeed, the last term is not zero and $$E(hat beta) neq beta$$ the OLS estimator is biased.

But the OLS estimator will be consistent if indeed the inclusion of the LAD eliminates serial correlation, because (using the properties of the plim operator)

$$operatorname{plim}hat beta = beta + operatorname{plim}left(frac 1{n-1} Z'Zright)^{-1}cdot operatorname{plim}left(frac 1{n-1}Z'eright)$$

Part of the standard assumptions (and rather "easily" satisfied), is that the first plim of the product converges to something finite. The second plim written explicitly is (and using the stationarity assumption to invoke the LLN)

$$operatorname{plim}left(frac 1{n-1}Z'mathbf eright) = left[begin{matrix} operatorname{plim}frac 1{n-1}sum_{i=2}^ne_i \ operatorname{plim}frac 1{n-1}sum_{i=2}^nx_{i}e_i \ operatorname{plim}frac 1{n-1}sum_{i=2}^ny_{i-1}e_i \ end{matrix}right] rightarrowleft[begin{matrix} E(e_i) \ E(x_{i}e_i) \ E(y_{i-1}e_i) \ end{matrix}right]; forall i$$

$E(emid X) = 0 Rightarrow E(e_i) = 0$, and also that $E(x_{i}e_i)=0$, for all $i$.

Finally, IF serial correlation has been removed, then $E(y_{i-1}e_i) =0$ also. So this plim goes to zero and therefore

$$operatorname{plim}hat beta = beta$$ i.e. the OLS estimator is indeed consistent in this case. So the "summary" is correct.

**Question 2)**

The full sentence from Wooldridge is

"It is also valid to use the SC-robust standard errors in models with lagged dependent variables

assuming, of course, that there is good reason for allowing serial correlation in such models".

meaning, when we have good reasons to believe that the inclusion of lagged dependent variables does not fully remove autocorrelation. And it seems we got ourselves a Catch-22: if serial correlation (SC) has been removed, why use SC-robust std errors? And if serial correlation has not been removed, our OLS estimator will be inconsistent, so in such a case is it meaningful/useful/appropriate to use asymptotic inference? Well, it appears that if we do suspect that SC still exists, it is better to try to do something about it, regardless. But your comment has merit, and I would suggest to contact Wooldridge directly on the matter, in order to get an authoritative answer.

### Similar Posts:

- Solved – Serial correlation
- Solved – Is E(u|x)=0 is a required condition for estimator consistency
- Solved – Proof that omitted variable bias may lead to endogeneity
- Solved – Inclusion of lagged dependent variable in regression
- Solved – Autocorrelation in residuals of a regression model with ARIMA errors (example in Rob Hyndman’s book) – Part 2