# Solved – Seasonal and autocorrelated regression residuals: difference raw data or residuals

I have a multiple linear model for time series data for which the regression residuals are autocorrelated and display seasonal behavior. This seasonal behavior is induced deterministically by a cyclic variable written into the model. In order to calculate corrected standard errors for the regression coefficients, I intend to use generalized least squares with correction for the autocorrelation in the residuals.

But for the seasonality, I am not sure for which time series I should perform the differencing: If for raw original data (observations), or for the regression residuals. For the data, I could superimpose each cycle and take the mean, thus removing season effects, but this would alter the data, and also would generalize bad for other kinds of data. For the regression residuals, I dont know if I have to difference the residuals time series substracting for each period, or to handle it as some kind of seasonal ARMA process.

Any hints?

LAST EDIT: The problem behind this question might have been resolved already, and might have been product of a misconception.

Regression was being done over simulations of the model. These simulations did not contain any stochastic error factor, so the simulations were purely of a deterministic nature. Regression errors were just showing an unperfect fit to the data and the regression residuals were obviously following a deterministic pattern. This pattern was not a result of some neglected explanatory variable in the model, and thus there was no reason to model it (by means of an ARMA model or any kind). When adding white noise to the data -which should be a crucial step on any simulation of real data- regression residuals were mostly dominated by stochasticity, loosing the autocorrelated behavior.

Contents

In order to calculate corrected standard errors for the regression coefficients, I intend to use generalized least squares with correction for the autocorrelation in the residuals.

Note that generalized least squares (GLS) would affect not only the standard errors but also the point estimates. Anyhow, you could gain power by estimating regression with an explicitly specified error structure, e.g. regression with ARMA errors as can be done using functions `stats::arima` or `forecast::auto.arima` in R. There you use maximum likelihood estimation instead of GLS. See related blog posts by Francis X. Diebold "The HAC Emperor has no Clothes" and "The HAC Emperor has no Clothes: Part 2" where he encourages explicit error specification as a way to get better coefficient estimates and gain predictive power. Although he discusses the case of HAC there, I believe similar conclusions apply here, too.

But for the seasonality, I am not sure for which time series I should perform the differencing: If for raw original data (observations), or for the regression residuals.

Since the problem arises due to a cyclic regressor, you could remove the deterministic component of the cyclic variable before including it in the model, or alternatively you could include some seasonal terms (dummies or Fourier terms) in the model.

For the data, I could superimpose each cycle and take the mean, thus removing season effects, but this would alter the data, and also would generalize bad for other kinds of data.

I am a little confused here, but I will try addressing this nevertheless.
With regards to the regressor, you can adjust using a model, and so altering data is not really a problem because you keep track of how you did it and you can recreate the original variable if you need to.
Regarding generalization, if the cyclic behaviour is unique for this instance, keeping it untreated would not help. If, on the other hand, it is similar across this instance and the ones you want to generalize to, you would not lose by removing the deterministic component before running the regression but then using it to adjust the other cases similarly.

A technical note: If you are doing a regression with ARMA errors, then it is the error that gets differenced. If the errors is some SARIMA process, regular treatment of SARIMA models applies (roughly speaking, you do not have to worry that it is a regression error rather than raw data).

Rate this post