Consider a data where samples from different populations of 5 species are analyzed after 4 treatments at 3 time intervals. So the independent variables are Population
, Species
, Treatment
and Time
.
Species
contains a set of Populations
not shared with any other Species
. Populations
are nested within Species
. The entire set of Populations
are sampled for each Treatment
and Time
.
What is the most appropriate ANOVA model to use with aov
in this case, if we want to compare Species, Treatments and Time effects along with their interactions and why? The following are the ones I could come up with.
m1 <- aov(y ~ Species*Treatment*Time, data) m2 <- aov(y ~ Species*Treatment*Time + Error(Population/(Species*Treatment*Time)), data) m3 <- aov(y ~ Species*Treatment*Time + Error(Population/(Time)), data)
Best Answer
If you want to treat the Populations
similarly to individual "subjects" in a repeated measures design, then the idea in specifying the Error
term in aov
for a factorial design like this is to include the within-subjects predictors in the Error
but to exclude the between-subjects predictors. From that perspective, none of your formulas seems to be what you seek.
The first model ignores the Population
variable. The second includes Species
in the Error
term even though each Population
only belongs to a single Species
(that is, Species
is between-Populations
, your equivalent to "subjects"). The third model might provide something useful, but the error term does not include the within-Populations
variable of Treatment
.
This Cross Validated Page has an almost identical example for this case, in the answer from @John, with 2 within-subjects and one between-subjects predictor. Translating the corresponding variable names, you would seem to want:
m4 <- aov(y ~ Species*Treatment*Time + Error(Population/(Treatment*Time)), data)
That said, you should be wary of approaching the analysis in this way. The aov
function in R uses Type I sums of squares, so the order of entry of variable names into the model will matter, with Species
accounting for as much variance as possible first, then Treatment
, then Time
, and then each of the interactions (in the order that R produces the interaction terms from the model specification). If that's what you want, fine, but be forewarned. That's different from other statistical software packages. Also, both that answer and the answer from @suncoolsu on that same page note that you might be better off using a mixed-effects model, which can be more flexible and may require fewer assumptions.
Similar Posts:
- Solved – Using Fourier analysis to check if data is oscillating
- Solved – Response variable with exponential distribution – how to analyse
- Solved – Detect if there is actually two populations in a sample
- Solved – Reproducing SPSS GLM in R, changed coefficients
- Solved – How to check whether 2 populations are same by sample