Solved – calculate adjusted means from ANCOVA if data is unbalanced

I have a 10k lines dataframes on which I want to perform ANCOVA so I can get adjusted means.

Please note that I've never done this before so I jump from a tutorial to another, but I still want to make it the right way.

So my model is like Y ~ X * sex, with

  • Y the dependant variable (continuous)
  • X the continuous independant variable
  • sex the discrete independant variable (here the sex)

Reading this tutorial, I could calculate the Y mean adjusted on X for each sex :

model = aov(Y~sex*X, data=x) data.predict = data.frame(sex=c("Male", "Female"), X=mean(x$X, na.rm=T)) data.frame(data.predict, Y=predict(model, data.predict)) 

This gives realistic results, but I realized that anova(aov(Y~sex*X, data=x)) and anova(aov(Y~X*sex, data=x)) give very different results. The calculated means are the same with both models though.

Reading the EdM answer in the question https://stats.stackexchange.com/a/213358/81974, I tried with the car package and Anova(model, type="III"), and this time both give the same results.

I don't really understand how it could matter, but it seems that my data are unbalanced (the aov help "Note" says that it could be misleading).

Knowing this, are the previously calculated adjusted means still usable ?

You have to be careful when you have factors interacting with covariates. Let me modify @Sal Mangiafico's example to provide a clearer illustration.

Data = transform(Data, Y = Y + (3 - 4*X)*as.numeric(Sex)) model2 = lm(Y ~ X + Sex + X:Sex, data = Data2) 

Now we have:

emmeans(model2, ~ Sex:X) ## Sex       X    emmean        SE df  lower.CL  upper.CL ## Female 10.5 -28.61896 0.6171581 16 -29.92727 -27.31064 ## Male   10.5 -67.69161 0.6171581 16 -68.99992 -66.38329 

compared with what you get if you look at the max and min of X:

emmeans(model2, ~ Sex:X, cov.reduce = range)  ## Sex     X       emmean        SE df    lower.CL     upper.CL  ## Female  1    0.5713260 1.5820724 16   -2.782518    3.9251696  ## Male    1   -0.9773716 0.5713774 16   -2.188638    0.2338944  ## Female 20  -57.8092402 0.5713774 16  -59.020506  -56.5979743  ## Male   20 -134.4058425 1.5820724 16 -137.759686 -131.0519988 

This illustrates why plotting the results is so important:

emmip(model2, Sex ~ X, cov.reduce = range) 

Interaction plot of EMMs

For more discussion, see the vignette on interactions in the emmeans package.

Similar Posts:

Rate this post

Leave a Comment