In the Breusch-Godfrey test we use a model

$$ e_t = varepsilon_t + beta_1 varepsilon_{t-1} + dots+ beta_p varepsilon_{t-p}. $$

If we reject the null hypotesis of no serial auotocorrelation of the error, it means that the residuals follow an auto-regressive model of order ($p$).

If I want to avoid this problem, I must add a certain number of lags of the response variable $y$ as regressors in the original model. However, in some cases this method is not useful.

Are there other options to consider?

**Contents**hide

#### Best Answer

If your regression-type model has serially correlated residuals, as a remedy you may include lags of the dependent variable as regressors, just as you mentioned. However, you might wish to preserve the original model for convenience of interpretation, direct representation of a theoretical model or other reasons. In such case, you have three options:

- Do what TPArrow suggested, i.e. keep the original model but allow the model errors to follow an AR process, and use penalized estimation. This way you get a
*penalized regression with AR errors*. - Keep the original model but allow the model errors to follow an ARMA (or more generally, SARIMA) process. This way you get a
*regression with ARMA errors*. - Leave the model specification intact and use heteroskedasticity and autocorrelation (HAC) robust standard errors.

Let us examine the options in more detail:

**1.**

See the answer by TPArrow.

**2.**

The model can be estimated using functions `arima`

("stats" package) or `auto.arima`

("forecast" package) in R. You enter the regressors via the argument `xreg`

and select the autoregressive and moving-average lag orders either manually (with `arima`

) or automatically (with `auto.arima`

).

Comparing **1.** with **2.**, the question is whether

- not allowing for moving average components in the error process but using penalization OR
- allowing for moving average components but not using penalization

works better for your particular example. I expect none of the two approaches to be *uniformly* better, so you could try both and see which gives better results. This could be evaluated, for example, by estimating the models on part of the original sample and examining their performance on the remaining part.

**3.**

Using HAC-robust standard errors may appear convenient but need not be the best option. Francis X. Diebold warns against that in his blog posts "The HAC Emperor has no Clothes" and "The HAC Emperor has no Clothes: Part 2" (and I am with him, if my voice counts):

Punting via kernel-HAC estimation is a bad idea in time series, for several reasons:

(1) Kernel-HAC is not likely to produce good $beta$ estimates [and that is important is not-so-large samples]. <…>

(2) Kernel-HAC is not likely to produce good $beta$ inference [because] <…> kernel-HAC standard errors may be unnecessarily unreliable in small samples, even if they're accurate asymptotically.

(3) Most crucially, kernel-HAC fails to capture invaluable predictive information. <…>

The clearly preferable approach is traditional parametric disturbance heteroskedasticty / autocorrelation modeling, with GLS/ML estimation. Simply allow for ARMA(p,q)-GARCH(P,Q) disturbances (say), with p, q, P and Q selected by AIC (say). (In many applications something like AR(3)-GARCH(1,1) or ARMA(1,1)-GARCH(1,1) would be more than adequate.)

(I encourage you to read the entire posts. They are quite short, very accessible and (last, but not the least) authored by a respected time series econometrician.)