Solved – Specifying Error() in R `aov` function

Consider a data where samples from different populations of 5 species are analyzed after 4 treatments at 3 time intervals. So the independent variables are Population, Species, Treatment and Time.

Species contains a set of Populations not shared with any other Species. Populations are nested within Species. The entire set of Populations are sampled for each Treatment and Time.

What is the most appropriate ANOVA model to use with aov in this case, if we want to compare Species, Treatments and Time effects along with their interactions and why? The following are the ones I could come up with.

m1 <- aov(y ~ Species*Treatment*Time, data)  m2 <- aov(y ~ Species*Treatment*Time + Error(Population/(Species*Treatment*Time)), data)  m3 <- aov(y ~ Species*Treatment*Time + Error(Population/(Time)), data) 

If you want to treat the Populations similarly to individual "subjects" in a repeated measures design, then the idea in specifying the Error term in aov for a factorial design like this is to include the within-subjects predictors in the Error but to exclude the between-subjects predictors. From that perspective, none of your formulas seems to be what you seek.

The first model ignores the Population variable. The second includes Species in the Error term even though each Population only belongs to a single Species (that is, Species is between-Populations, your equivalent to "subjects"). The third model might provide something useful, but the error term does not include the within-Populations variable of Treatment.

This Cross Validated Page has an almost identical example for this case, in the answer from @John, with 2 within-subjects and one between-subjects predictor. Translating the corresponding variable names, you would seem to want:

 m4 <- aov(y ~ Species*Treatment*Time + Error(Population/(Treatment*Time)), data) 

That said, you should be wary of approaching the analysis in this way. The aov function in R uses Type I sums of squares, so the order of entry of variable names into the model will matter, with Species accounting for as much variance as possible first, then Treatment, then Time, and then each of the interactions (in the order that R produces the interaction terms from the model specification). If that's what you want, fine, but be forewarned. That's different from other statistical software packages. Also, both that answer and the answer from @suncoolsu on that same page note that you might be better off using a mixed-effects model, which can be more flexible and may require fewer assumptions.

Similar Posts:

Rate this post

Leave a Comment