Solved – Can ANCOVA be used with a dichotomous dependent variable? If not, what have these authors done

As stated above, if and how can ANCOVA be used with a dichotomous variable as the dependent/outcome variable? I found an article of which I wish to replicate some analyses ( that runs a series of ANCOVA models to produce age, sex and race-adjusted estimates for risk factors (which are categorical and continuous variables) for people falling in 3 categories. I understand how this can be done for continuous risk factors such as glucose level, because I understand ANCOVA to have a continuous dependent variable and both categorical and continuous independent/predictor variables. However I do not understand how this can be done with a categorical variable such as diabetes.

The results of the analyses are given in table 2 and the description of the methods is as follows: "Age, sex and race-adjusted means and percentages by ABI group at follow-up were computed using analysis of covariance. The p-values from these analyses compared those progressing into the low or high groups to those maintaining an ABI in the normal range."

Can anyone help me explain what the authors might have done, if this is correct and where I can find documentation on how to do this (I generally work with R, but other sources are fine, too). Your help is greatly appreciated.

The authors are talking about odds, so I'd guess they used logistic regression with multiple covariables ("multivariable regression").

The terms "ANOVA" and "ANCOVA" do not make much sense in the framework of generalized linear models since no (residual) variances are compared but rather "deviances" (which is a quantity linked to the maximized value of the likelihood function). Still, they are sometimes used when doing analyses that mimic ANOVA/ANCOVA of linear models. It would be better to say "ANOVA-like analysis" or "ANCOVA-like analysis" to distinguish.

BTW: It is quite unconventional to use the classic linear model (ordinary least squares) with a binary response. There are different reasons:

  1. The equal variance assumption of OLS is inviolated because conditional variance depends on conditional mean. This is certainly a problem with inferential results and fitted probabilities close to 0 and 1.
  2. Predictions might be outside the unit interval. This can easily be fixed by setting predictions below 0 to 0 and those above 1 to 1.
  3. The additive structure of the model equation is often unnatural: Using OLS, a jump from 50% to 60% probability is equally worth than a jump from 90% to 100%.

Similar Posts:

Rate this post

Leave a Comment