EDIT: I've modified my STAN code and it looks like I am getting numbers close to using R's `arima`

. The original code, now moved to the end, was incorrect.

I've been using STAN for simple things, but want to do a linear regression that has ARMA(1,1) noise. I've heard that this is basically equivalent to AR(1) + White Noise. I've made an attempt that gives reasonable numbers but I can't tell if I'm actually doing what I think I'm doing.

The reason I'm attempting to do this is that the data is a climate temperature time series, which is thus serially correlated, and so I believe OLS regression — used to calculate a trend — will underestimate the uncertainty in the regression, which is reflected in the SE's of the slope and intercept coefficients. (And also in the residual error?)

My code, using `rstan`

:

`temps <- structure(c(-0.59, -0.17, 0.05, -0.7, -0.27, -0.94, -0.69, -0.96, -0.58, -0.35, -0.58, -0.54, -0.48, -1.41, -0.82, -0.73, -0.48, -0.37, -0.07, -0.16, -0.58, -0.43, -0.16, -0.19, -0.81, -0.37, -0.52, -0.55, -0.51, -0.85, -0.43, -0.72, -0.43, -0.63, 0.16, -0.26, -0.14, -0.48, -0.61, -0.36, -0.05, 0.22, -0.34, -0.23, -0.2, -0.18, 0.51, -0.2, 0.28, -0.53, -0.07, 0.09, 0.45, -0.27, -0.12, -0.35, -0.21, 0.11, 0.37, 0.09, -0.18, 0.14, 0.35, -0.16, 0.62, 0.04, 0.32, -0.12, 0.41, 0.28, -0.4, -0.34, 0.2, 0.26, -0.27, 0.44, -0.11, -0.15, 0.68, 0.24, 0.12, 0.16, 0.33, 0.12, -0.04, -0.01, -0.22, -0.14, -0.26, -0.42, -0.01, -0.1, -0.57, 0.16, -0.31, 0.12, 0.06, -0.19, 0.1, -0.01, 0.16, 0.81, -0.13, 0.53, 0.31, 0.06, 0.34, 0.23, 0.57, 0.07, 0.41, 0.51, 0.52, 0.39, 0.34, 0.73, 0.3, 0.38, 0.66, 0.6, 0.35, 0.55, 0.98, 0.89, 0.67, 0.86, 0.6, 1.31, 0.2, 0.75, 0.72, 0.53, 0.48), .Tsp = c(1880, 2012, 1), class = "ts") stan.code1 <- "data { int<lower=1> N ; real x[N] ; real y[N] ; } parameters { real alpha ; real beta ; real kappa ; real<lower=0> sigma0 ; } model { real sigma[N] ; alpha ~ cauchy (0, 5) ; beta ~ cauchy (0, 5) ; kappa ~ gamma (1.2, 1) ; sigma0 ~ gamma (3, 1) ; sigma[1] <- sigma0 ; y[1] ~ normal (alpha + beta * x[1], sigma[1]) ; for (n in 2:N) { sigma[n] <- sigma0 + kappa * sigma[n-1] ; y[n] ~ normal (alpha + beta * x[n], sigma[n]) ; } }" stan.list1 <- list (N=length (temps), x=1880:2012, y=temps) stan.model1 <- stan (model_code=stan.code1, model_name="GISS NH Jan", data=stan.list1, iter=15000, chain=4) print (stan.model1, digits_summary=8) `

The alpha, beta, and sigma appear to be reasonable if I use them to throw lines onto the data using `abline`

.

QUESTIONS:

Have I really done a linear regression with AR(1) noise plus white noise? That is ARMA(1,1) noise? I believe $kappa={{1-theta}over{1-phi}}$.

Given that it is done properly — either I lucked out, or someone will post a correct version of the STAN code — how would I use kappa in calculating the larger uncertainty interval around the regression?

EDIT: Original code I used, with `rstan`

:

`temps <- structure(c(-0.59, -0.17, 0.05, -0.7, -0.27, -0.94, -0.69, -0.96, -0.58, -0.35, -0.58, -0.54, -0.48, -1.41, -0.82, -0.73, -0.48, -0.37, -0.07, -0.16, -0.58, -0.43, -0.16, -0.19, -0.81, -0.37, -0.52, -0.55, -0.51, -0.85, -0.43, -0.72, -0.43, -0.63, 0.16, -0.26, -0.14, -0.48, -0.61, -0.36, -0.05, 0.22, -0.34, -0.23, -0.2, -0.18, 0.51, -0.2, 0.28, -0.53, -0.07, 0.09, 0.45, -0.27, -0.12, -0.35, -0.21, 0.11, 0.37, 0.09, -0.18, 0.14, 0.35, -0.16, 0.62, 0.04, 0.32, -0.12, 0.41, 0.28, -0.4, -0.34, 0.2, 0.26, -0.27, 0.44, -0.11, -0.15, 0.68, 0.24, 0.12, 0.16, 0.33, 0.12, -0.04, -0.01, -0.22, -0.14, -0.26, -0.42, -0.01, -0.1, -0.57, 0.16, -0.31, 0.12, 0.06, -0.19, 0.1, -0.01, 0.16, 0.81, -0.13, 0.53, 0.31, 0.06, 0.34, 0.23, 0.57, 0.07, 0.41, 0.51, 0.52, 0.39, 0.34, 0.73, 0.3, 0.38, 0.66, 0.6, 0.35, 0.55, 0.98, 0.89, 0.67, 0.86, 0.6, 1.31, 0.2, 0.75, 0.72, 0.53, 0.48), .Tsp = c(1880, 2012, 1), class = "ts") stan.code1 <- "data { int<lower=1> N ; real x[N] ; real y[N] ; } parameters { real alpha ; real beta ; real kappa ; real<lower=0> sigma ; } model { alpha ~ cauchy (0, 5) ; beta ~ cauchy (0, 5) ; kappa ~ cauchy (0, 5) ; sigma ~ gamma (3, 1) ; y[1] ~ normal (alpha + beta * x[1], sigma) ; for (n in 2:N) { y[n] ~ normal (alpha + beta * x[n] + kappa * y[n-1], sigma) ; } }" stan.list1 <- list (N=length (temps), x=1880:2012, y=temps) stan.model1 <- stan (model_code=stan.code1, model_name="GISS NH Jan", data=stan.list1, iter=15000, chain=4) print (stan.model1, digits_summary=8) `

**Contents**hide

#### Best Answer

This is a first order autoregressive model and the noise is just normal, you aren't doing anything to it. What you should use depends on wether the autocorrelation is induced by dynamic misspecification (which your current formulation takes care of if the mean is in fact an ar1 process) or whether the dgp induces autocorrelated errors on top of mean autoregression. Autocorrelated errors could also be due to omitted variables or another form of misspecification.