# Solved – calculate adjusted means from ANCOVA if data is unbalanced

I have a 10k lines dataframes on which I want to perform ANCOVA so I can get adjusted means.

Please note that I've never done this before so I jump from a tutorial to another, but I still want to make it the right way.

So my model is like `Y ~ X * sex`, with

• `Y` the dependant variable (continuous)
• `X` the continuous independant variable
• `sex` the discrete independant variable (here the sex)

Reading this tutorial, I could calculate the Y mean adjusted on X for each sex :

``model = aov(Y~sex*X, data=x) data.predict = data.frame(sex=c("Male", "Female"), X=mean(x\$X, na.rm=T)) data.frame(data.predict, Y=predict(model, data.predict)) ``

This gives realistic results, but I realized that `anova(aov(Y~sex*X, data=x))` and `anova(aov(Y~X*sex, data=x))` give very different results. The calculated means are the same with both models though.

Reading the EdM answer in the question https://stats.stackexchange.com/a/213358/81974, I tried with the `car` package and `Anova(model, type="III")`, and this time both give the same results.

I don't really understand how it could matter, but it seems that my data are unbalanced (the `aov` help "Note" says that it could be misleading).

Knowing this, are the previously calculated adjusted means still usable ?

Contents

You have to be careful when you have factors interacting with covariates. Let me modify @Sal Mangiafico's example to provide a clearer illustration.

``Data = transform(Data, Y = Y + (3 - 4*X)*as.numeric(Sex)) model2 = lm(Y ~ X + Sex + X:Sex, data = Data2) ``

Now we have:

``emmeans(model2, ~ Sex:X) ## Sex       X    emmean        SE df  lower.CL  upper.CL ## Female 10.5 -28.61896 0.6171581 16 -29.92727 -27.31064 ## Male   10.5 -67.69161 0.6171581 16 -68.99992 -66.38329 ``

compared with what you get if you look at the max and min of `X`:

``emmeans(model2, ~ Sex:X, cov.reduce = range)  ## Sex     X       emmean        SE df    lower.CL     upper.CL  ## Female  1    0.5713260 1.5820724 16   -2.782518    3.9251696  ## Male    1   -0.9773716 0.5713774 16   -2.188638    0.2338944  ## Female 20  -57.8092402 0.5713774 16  -59.020506  -56.5979743  ## Male   20 -134.4058425 1.5820724 16 -137.759686 -131.0519988 ``

This illustrates why plotting the results is so important:

``emmip(model2, Sex ~ X, cov.reduce = range) ``

For more discussion, see the vignette on interactions in the emmeans package.

Rate this post