I attempted to run an ANCOVA with one binary predictor, one continuous outcome, and one continuous covariate. I found that there was heterogeneity of regression slopes and thus I concluded that an ANCOVA was inappropriate.

Should I then run a model with random slopes and random intercepts, i.e.

$$Y_{ij}=(beta_{0}+u_{0j})+(beta_{1}+u_{0j})X_{1j}+e_{ij}$$

Or should I just add an interaction term to my (formerly) ANCOVA model?

$$Y_{ij}=beta_{0}+beta_{1}X_{1i}+beta_{2}X_{2j}+beta_{3}Interaction_{ij}+e_{ij}$$

Or, does it not make any difference? Conceptually, what is the difference between the two approaches?

**Contents**hide

#### Best Answer

In this case, I think you want to use the model with the interaction. A random slopes/intercepts model would suggest that you have slopes/intercepts that can vary with regard to some random factor. For example, imagine that you have participants completing a task in which they make several responses to images under condition A and under condition B. You may be interested in the effect of condition on your outcome controlling for gender. Thus you might run an ANCOVA as you tried before and it would look like this:

$$ Outcome_{ij} = beta_0 + beta_1Condition_{ij} + beta_2Gender_j + epsilon_{ij}$$

Here i represents a particular trial and j represents a particular participant. Since condition varies within each participant you have different condition values within participants. Since gender varies between participants, gender only varies as a function of j and not i. This sort of model, in this case, would be incorrect in that it would ignore important violations of assumptions of non-independence of errors. If you had such a repeated measures design, it would be correct, instead to allow the intercept and slope of condition to vary from participant to participant as so:

$$ Outcome_{ij} = (beta_0 + u_{0j}) + (beta_1 + u_{1j})Condition_{ij} + beta_2Gender_j + epsilon_{ij} $$

Such a model would solve your non-independence problem, but likely not your homogeneity of regression problem, since (in this example) homogeneity of regression would mean that the effect of condition depends on gender. Thus, even with a random slopes and intercepts model you would want to include the Condition*Gender interaction. I suppose you could technically allow the slope and intercept in your model to vary randomly as a function of gender, but I think this makes little sense for two reasons.

Gender is a fixed effect and it is likely helpful for you to be able to easily determine by how much the condition effect is different from males to females. The random intercept/slope model will only provide you with a measure of variance of the Condition effect across levels of the random factor.

Secondly, it gets tricky to estimate random variances with only two levels of the random factor. Those at the glmm wiki (http://glmm.wikidot.com/faq) state that a random factor should have a minimum of 5-6 levels. Additionally the authors of the wiki cite Crawley, M. J. 2002. Statistical Computing: An Introduction to Data Analysis using S-PLUS. John Wiley & Sons. which may be useful reading regarding this particular consideration.

Anyways, my recommendation would be to stick with the model that includes the X1*X2 interaction, although the notation on your interaction model suggests that you might also want to make sure you're accounting for potential violations of the assumption of non-independence.