I have a 10k lines dataframes on which I want to perform ANCOVA so I can get adjusted means.
Please note that I've never done this before so I jump from a tutorial to another, but I still want to make it the right way.
So my model is like Y ~ X * sex
, with
Y
the dependant variable (continuous)X
the continuous independant variablesex
the discrete independant variable (here the sex)
Reading this tutorial, I could calculate the Y mean adjusted on X for each sex :
model = aov(Y~sex*X, data=x) data.predict = data.frame(sex=c("Male", "Female"), X=mean(x$X, na.rm=T)) data.frame(data.predict, Y=predict(model, data.predict))
This gives realistic results, but I realized that anova(aov(Y~sex*X, data=x))
and anova(aov(Y~X*sex, data=x))
give very different results. The calculated means are the same with both models though.
Reading the EdM answer in the question https://stats.stackexchange.com/a/213358/81974, I tried with the car
package and Anova(model, type="III")
, and this time both give the same results.
I don't really understand how it could matter, but it seems that my data are unbalanced (the aov
help "Note" says that it could be misleading).
Knowing this, are the previously calculated adjusted means still usable ?
Best Answer
You have to be careful when you have factors interacting with covariates. Let me modify @Sal Mangiafico's example to provide a clearer illustration.
Data = transform(Data, Y = Y + (3 - 4*X)*as.numeric(Sex)) model2 = lm(Y ~ X + Sex + X:Sex, data = Data2)
Now we have:
emmeans(model2, ~ Sex:X) ## Sex X emmean SE df lower.CL upper.CL ## Female 10.5 -28.61896 0.6171581 16 -29.92727 -27.31064 ## Male 10.5 -67.69161 0.6171581 16 -68.99992 -66.38329
compared with what you get if you look at the max and min of X
:
emmeans(model2, ~ Sex:X, cov.reduce = range) ## Sex X emmean SE df lower.CL upper.CL ## Female 1 0.5713260 1.5820724 16 -2.782518 3.9251696 ## Male 1 -0.9773716 0.5713774 16 -2.188638 0.2338944 ## Female 20 -57.8092402 0.5713774 16 -59.020506 -56.5979743 ## Male 20 -134.4058425 1.5820724 16 -137.759686 -131.0519988
This illustrates why plotting the results is so important:
emmip(model2, Sex ~ X, cov.reduce = range)
For more discussion, see the vignette on interactions in the emmeans package.