I have a 10k lines dataframes on which I want to perform ANCOVA so I can get adjusted means.

Please note that I've never done this before so I jump from a tutorial to another, but I still want to make it the right way.

So my model is like `Y ~ X * sex`

, with

`Y`

the dependant variable (continuous)`X`

the continuous independant variable`sex`

the discrete independant variable (here the sex)

Reading this tutorial, I could calculate the Y mean adjusted on X for each sex :

`model = aov(Y~sex*X, data=x) data.predict = data.frame(sex=c("Male", "Female"), X=mean(x$X, na.rm=T)) data.frame(data.predict, Y=predict(model, data.predict)) `

This gives realistic results, but I realized that `anova(aov(Y~sex*X, data=x))`

and `anova(aov(Y~X*sex, data=x))`

give very different results. The calculated means are the same with both models though.

Reading the EdM answer in the question https://stats.stackexchange.com/a/213358/81974, I tried with the `car`

package and `Anova(model, type="III")`

, and this time both give the same results.

I don't really understand how it could matter, but it seems that my data are unbalanced (the `aov`

help "Note" says that it could be misleading).

Knowing this, are the previously calculated adjusted means still usable ?

**Contents**hide

#### Best Answer

You have to be careful when you have factors interacting with covariates. Let me modify @Sal Mangiafico's example to provide a clearer illustration.

`Data = transform(Data, Y = Y + (3 - 4*X)*as.numeric(Sex)) model2 = lm(Y ~ X + Sex + X:Sex, data = Data2) `

Now we have:

`emmeans(model2, ~ Sex:X) ## Sex X emmean SE df lower.CL upper.CL ## Female 10.5 -28.61896 0.6171581 16 -29.92727 -27.31064 ## Male 10.5 -67.69161 0.6171581 16 -68.99992 -66.38329 `

compared with what you get if you look at the max and min of `X`

:

`emmeans(model2, ~ Sex:X, cov.reduce = range) ## Sex X emmean SE df lower.CL upper.CL ## Female 1 0.5713260 1.5820724 16 -2.782518 3.9251696 ## Male 1 -0.9773716 0.5713774 16 -2.188638 0.2338944 ## Female 20 -57.8092402 0.5713774 16 -59.020506 -56.5979743 ## Male 20 -134.4058425 1.5820724 16 -137.759686 -131.0519988 `

This illustrates why plotting the results is so important:

`emmip(model2, Sex ~ X, cov.reduce = range) `

For more discussion, see the vignette on interactions in the **emmeans** package.