I'm fitting the generalized extreme value distribution (GEV) to a series of annual maxima of variable $X$. $X$ exhibits a linear trend.
When I fit the GEV to $X$, I think I have the choice to
Use linear regression to compute the slope over time, and remove the slope
a) $X_i = B_0 + B_1 times Y_i + e_i;~ e sim N(0,σ^2)$; $Y$ is a time index, e.g., year.
b) The residuals, $e$, of the regression are the stationary series, $X_s$
Introduce $Y$ as a co-variate into my ML fit of the GEV distribution
So pretty much I can make the time series stationary, then fit the GEV, or I could introduce a co-variate into my GEV fit, and do it all at once.
Ultimately I'm asking if I can use the two procedures interchangeably, or if one is more appropriate.
For a bit more background, I'm interested in the answer b/c I actually have ~70 independent series of $X$, some of which are stationary, and some of which are not. If I wanted to detrend all of them, "just to be sure", I'd rather use procedure 1. If I have to use procedure 2, I'd be more selective about which ones I detrended, b/c I don't want to introduce an extra parameter into the ML fit. This seemingly unfair trade-off made me suspicious of the validity of procedure 1: Am I fitting a parameter for free?
I'm interested in the GEV parameters and their s.e.'s for each series of $X$, especially the shape parameter. I also want to create plots of "return level" vs. "return time". The shape of these plots is determined by the GEV shape parameter, but the location parameter (I think) just shifts the return level up or down (i.e., across return time, return level is linear in the location and scale parameters, and nonlinear in the shape parameter).
Best Answer
The gaussian regression model is $Y_t simtextrm{Norm}(nu_t,,tau^2)$ with $nu_t = beta_0 + beta_1 X_t$. It involves $3$ unknown parameters $beta_0$, $beta_1$ and $tau^2$. The ML estimates are given by OLS.
Using Stuart Coles' book notation, an alternative model is $Y_t simtextrm{GEV}(mu_t,,sigma,,xi)$ with varying location parameter $mu_t = beta_0 + beta_1 X_t$, and $sigma$ and $xi$ unknown. This is an extreme value regression involving $4$ parameters $beta_0$, $beta_1$, $sigma$ and $xi$. The two models are fairly distinct and ML estimates of $beta_0$ and $beta_1$ will be different, as will be their s.e.
While the $beta_1$ coefficients for the two models can be compared, plugging an OLS estimate into the GEV model is at the best a “quick and dirty'' solution. The GEV model is easily fitted by ML with the R CRAN packages ismev and evd. These packages produce as well the return the level plots that you need, and ismev also fits models with $sigma$ dependent on $t$.
The notion of return level is clear for stationary models but need much care for non-stationary models, since the 100-years (say) return level will vary with time $t$.
In the constant parameter case, the location parameter $mu$ of the GEV distribution indeed just shifts the distribution, the same being true in the gaussian regression. Its effect on the $m$-years return level $y_m$ can be investigated by studying the derivative $partial y_m/partial mu$. Roughly speaking, the $3$ parameters $mu$, $sigma$ and $xi$ have increasing impact on the tail distribution.
library(evd) n <- 200; beta1 <- 0.015; X <- 1:n ## simulate data Y <- rgev(n = n, loc = 0, scale = 1, shape = 0.15) + beta1 * X ## fit gaussian and GEV regressions fit.lm <- lm(Y ~ X) summary(fit.lm) fit.gev <- fgev(Y, nsloc = data.frame(X)) fit.gev AIC(fit.lm) AIC(fit.gev)