# Solved – Longitudinal data: time series, repeated measures, or something else

In plain English:
I have a multiple regression or ANOVA model but the response variable for each individual is a curvilinear function of time.

• How can I tell which of the right-hand-side variables are responsible for significant differences in the shapes or vertical offsets of the curves?
• Is this a time-series problem, a repeated-measures problem, or something else entirely?
• What are the best-practices for analyzing such data (preferably in `R`, but I'm open to using other software)?

In more precise terms:
Let's say I have a model \$y_{ijk} = beta_0 + beta_1 x_i + beta_2 x_j + beta_3 x_i x_j + epsilon_k\$ but \$y_{ijk}\$ is actually a series of data-points collected from the same individual \$k\$ at many time-points \$t\$, which were recorded as a numeric variable. Plotting the data shows that for each individual \$y_{ijkt}\$ is a quadratic or cyclical function of time whose vertical offset, shape, or frequency (in the cyclical case) might significantly depend on the covariates. The covariates do not change over time– i.e., an individual has a constant body weight or treatment group for the duration of the data collection period.

So far I have tried the following `R` approaches:

1. Manova

``Anova(lm(YT~A*B,mydata),idata=data.frame(TIME=factor(c(1:10))),idesign=~TIME);  ``

…where `YT` is a matrix whose columns are the time points, 10 of them in this example, but far more in the real data.

Problem: this treats time as a factor, but the time-points don't exactly match for each individual. Furthermore, there are many of them relative to the sample size so the model gets saturated. It seems like the shape of the response variable over time is ignored.

2. Mixed-model (as in Pinheiro and Bates, Mixed Effect Models in S and S-Plus)

``lme(fixed=Y~ A*B*TIME + sin(2*pi*TIME) + cos(2*pi*TIME), data=mydata,      random=~(TIME + sin(2*pi*TIME) + cos(2*pi*TIME))|ID), method='ML') ``

…where `ID` is a factor that groups data by individual. In this example the response is cyclical over time, but there could instead be quadratic terms or other functions of time.

Problem: I'm not certain whether each time term is necessary (especially for quadratic terms) and which ones are affected by which covariates.

• Is `stepAIC()` a good method for selecting them?
• If it does remove a time-dependent term, will it also remove it from the `random` argument?
• What if I also use an autocorrelation function (such as `corEXP()`) that takes a formula in the `correlation` argument– should I make that formula for `corEXP()` the same as the one in `random` or just `~1|ID`?
• The `nlme` package is rarely mentioned in the context of time series outside Pinheiro and Bates… is it not considered well suited to this problem?
3. Fitting a quadratic or trigonometric model to each individual, and then using each coefficient as a response variable for multiple regression or ANOVA.

Problem: Multiple comparison correction necessary. Can't think of any other problems which makes me suspicious that I'm overlooking something.

4. As previously suggested on this site (What is the term for a time series regression having more than one predictor?), there are ARIMAX and transfer function / dynamic regression models.

Problem: ARMA-based models assume discrete times, don't they? As for dynamic regression, I heard about it for the first time today, but before I delve into yet another new method that might not pan out after all, I thought it would be prudent to ask people who have done this before for advice.

Contents

As Jeromy Anglim said, it would help to know the number of time points you have for each individual; as you said "many" I would venture that functional analysis might be a viable alternative. You might want to check the R package fda and look at the book by Ramsay and Silverman.

Rate this post

# Solved – Longitudinal data: time series, repeated measures, or something else

In plain English:
I have a multiple regression or ANOVA model but the response variable for each individual is a curvilinear function of time.

• How can I tell which of the right-hand-side variables are responsible for significant differences in the shapes or vertical offsets of the curves?
• Is this a time-series problem, a repeated-measures problem, or something else entirely?
• What are the best-practices for analyzing such data (preferably in `R`, but I'm open to using other software)?

In more precise terms:
Let's say I have a model \$y_{ijk} = beta_0 + beta_1 x_i + beta_2 x_j + beta_3 x_i x_j + epsilon_k\$ but \$y_{ijk}\$ is actually a series of data-points collected from the same individual \$k\$ at many time-points \$t\$, which were recorded as a numeric variable. Plotting the data shows that for each individual \$y_{ijkt}\$ is a quadratic or cyclical function of time whose vertical offset, shape, or frequency (in the cyclical case) might significantly depend on the covariates. The covariates do not change over time– i.e., an individual has a constant body weight or treatment group for the duration of the data collection period.

So far I have tried the following `R` approaches:

1. Manova

``Anova(lm(YT~A*B,mydata),idata=data.frame(TIME=factor(c(1:10))),idesign=~TIME);  ``

…where `YT` is a matrix whose columns are the time points, 10 of them in this example, but far more in the real data.

Problem: this treats time as a factor, but the time-points don't exactly match for each individual. Furthermore, there are many of them relative to the sample size so the model gets saturated. It seems like the shape of the response variable over time is ignored.

2. Mixed-model (as in Pinheiro and Bates, Mixed Effect Models in S and S-Plus)

``lme(fixed=Y~ A*B*TIME + sin(2*pi*TIME) + cos(2*pi*TIME), data=mydata,      random=~(TIME + sin(2*pi*TIME) + cos(2*pi*TIME))|ID), method='ML') ``

…where `ID` is a factor that groups data by individual. In this example the response is cyclical over time, but there could instead be quadratic terms or other functions of time.

Problem: I'm not certain whether each time term is necessary (especially for quadratic terms) and which ones are affected by which covariates.

• Is `stepAIC()` a good method for selecting them?
• If it does remove a time-dependent term, will it also remove it from the `random` argument?
• What if I also use an autocorrelation function (such as `corEXP()`) that takes a formula in the `correlation` argument– should I make that formula for `corEXP()` the same as the one in `random` or just `~1|ID`?
• The `nlme` package is rarely mentioned in the context of time series outside Pinheiro and Bates… is it not considered well suited to this problem?
3. Fitting a quadratic or trigonometric model to each individual, and then using each coefficient as a response variable for multiple regression or ANOVA.

Problem: Multiple comparison correction necessary. Can't think of any other problems which makes me suspicious that I'm overlooking something.

4. As previously suggested on this site (What is the term for a time series regression having more than one predictor?), there are ARIMAX and transfer function / dynamic regression models.

Problem: ARMA-based models assume discrete times, don't they? As for dynamic regression, I heard about it for the first time today, but before I delve into yet another new method that might not pan out after all, I thought it would be prudent to ask people who have done this before for advice.