I always believed that time should not be used as a predictor in regressions (incl. gam's) because, then, one would simply "describe" the trend itself. If the aim of a study is to find environmental parameters like temperature etc. that explain the variance in, let´s say, activity of an animal, then I wonder, how can time be of any use? as a proxy for unmeasured parameters?
Some trends in time on activity data of harbor porpoises can be seen here:
-> How to handle gaps in a time series when doing GAMM?
my problem is: when I include time in my model (measured in julian days), then 90% of all other parameters become insignificant (ts-shrinkage smoother from mgcv kick them out). If I leave time out, then some of them are significant…
The question is: is time allowed as a predictor (maybe even needed?) or is it messing up my analysis?
many thanks in advance
Best Answer
Time is allowed; whether it is needed will depend on what you are trying to model? The problem you have is that you have covariates that together appear to fit the trend in the data, which Time can do just as well but using fewer degrees of freedom – hence they get dropped out instead of Time.
If the interest is to model the system, the relationship between the response and the covariates over time, rather than model how the response varies over time, then do not include Time as a covariate. If the aim is to model the change in the mean level of the response, include Time but do not include the covariate. From what you say, it would appear that you want the former, not the latter, and should not include Time in your model. (But do consider the extra info below.)
There are a couple of caveats though. For theory to hold, the residuals should be i.i.d. (or i.d. if you relax the independence assumption using a correlation structure). If you are modelling the response as a function of covariates and they do not adequately model any trend in the data, then the residuals will have a trend, which violates the assumptions of theory, unless the correlation structure fitted can cope with this trend.
Conversely, if you are modelling the trend in the response alone (just including Time), there may be systematic variation in the residuals (about the fitted trend) that is not explained by the trend (Time), and this might also violate the assumptions for the residuals. In such cases you might need to include other covariates to render the residuals i.i.d.
Why is this an issue? Well when you are testing if the trend component, for example, is significant, or whether the effects of covariates are significant, the theory used will assume the residuals are i.i.d. If they aren't i.i.d. then the assumptions won't be met and the p-values will be biased.
The point of all this is that you need to model all the various components of the data such that the residuals are i.i.d. for the theory you use, to test if the fitted components are significant, to be valid.
As an example, consider seasonal data and we want to fit a model that describes the long-term variation in the data, the trend. If we only model the trend and not the seasonal cyclic variation, we are unable to test whether the fitted trend is significant because the residuals will not be i.i.d. For such data, we would need to fit a model with both a seasonal component and a trend component, and a null model that contained just the seasonal component. We would then compare the two models using a generalized likelihood ratio test to assess the significance of the fitted trend. This is done using anova()
on the $lme
components of the two models fitted using gamm()
.