# Solved – Specifying Error() in R `aov` function

Consider a data where samples from different populations of 5 species are analyzed after 4 treatments at 3 time intervals. So the independent variables are `Population`, `Species`, `Treatment` and `Time`.

`Species` contains a set of `Populations` not shared with any other `Species`. `Populations` are nested within `Species`. The entire set of `Populations` are sampled for each `Treatment` and `Time`.

What is the most appropriate ANOVA model to use with `aov` in this case, if we want to compare Species, Treatments and Time effects along with their interactions and why? The following are the ones I could come up with.

``m1 <- aov(y ~ Species*Treatment*Time, data)  m2 <- aov(y ~ Species*Treatment*Time + Error(Population/(Species*Treatment*Time)), data)  m3 <- aov(y ~ Species*Treatment*Time + Error(Population/(Time)), data) ``
Contents

If you want to treat the `Populations` similarly to individual "subjects" in a repeated measures design, then the idea in specifying the `Error` term in `aov` for a factorial design like this is to include the within-subjects predictors in the `Error` but to exclude the between-subjects predictors. From that perspective, none of your formulas seems to be what you seek.

The first model ignores the `Population` variable. The second includes `Species` in the `Error` term even though each `Population` only belongs to a single `Species` (that is, `Species` is between-`Populations`, your equivalent to "subjects"). The third model might provide something useful, but the error term does not include the within-`Populations` variable of `Treatment`.

This Cross Validated Page has an almost identical example for this case, in the answer from @John, with 2 within-subjects and one between-subjects predictor. Translating the corresponding variable names, you would seem to want:

`` m4 <- aov(y ~ Species*Treatment*Time + Error(Population/(Treatment*Time)), data) ``

That said, you should be wary of approaching the analysis in this way. The `aov` function in R uses Type I sums of squares, so the order of entry of variable names into the model will matter, with `Species` accounting for as much variance as possible first, then `Treatment`, then `Time`, and then each of the interactions (in the order that R produces the interaction terms from the model specification). If that's what you want, fine, but be forewarned. That's different from other statistical software packages. Also, both that answer and the answer from @suncoolsu on that same page note that you might be better off using a mixed-effects model, which can be more flexible and may require fewer assumptions.

Rate this post