I have just completed a multilevel, longitudinal logistic regression testing, at four different time points, whether participants in an experimental group are more likely to have committed any drug-related crime in the previous 4 weeks than participants receiving placebo.

The syntax for the glmer looks like this

`sumDrug <- summary(drugMod <- glmer(drugCrime ~ group*week + (1|id), data = oti, family = binomial, control = glmerControl(optimizer = "bobyqa"), nAGQ = 10)) `

Where `drugCrime`

is the binary outcome variable "have committed one or more drug-related crimes in previous 4 weeks?" (Y/N response), `group`

is a factor indicating group allocation, and `week`

is a factor indicating time of measurement (at weeks 0, 4, 8, and 12).

If I treatment code the two factors (see here) the output of the regression (with Odds Ratios and CIs) looks like this:

` var Estimate Std..Error z.value Pr...z.. OR lo hi 1 (Intercept) -1.234 0.562 -2.197 0.028 0.29 0.04 2.28 2 group2 -0.108 0.779 -0.139 0.890 0.90 0.11 7.02 3 week2 -1.450 0.632 -2.295 0.022 0.23 0.03 1.84 4 week3 -0.965 0.652 -1.480 0.139 0.38 0.05 2.98 5 week4 -0.696 0.652 -1.067 0.286 0.50 0.06 3.90 6 group2:week2 -1.366 1.011 -1.351 0.177 0.26 0.03 2.00 7 group2:week3 -0.822 1.001 -0.821 0.411 0.44 0.06 3.44 8 group2:week4 -2.580 1.136 -2.271 0.023 0.08 0.01 0.59 `

And the output of the `Anova(drugMod)`

in the `car`

package returns

`Analysis of Deviance Table (Type III Wald chisquare tests) Response: drugCrime Chisq Df Pr(>Chisq) (Intercept) 4.8248 1 0.02805 * group 0.0193 1 0.88955 week 5.5199 3 0.13745 group:week 5.4146 3 0.14383 `

If, on the other hand, I simple code the two factors (also see here) the output of the regression looks like this:

` var Estimate Std..Error z.value Pr...z.. OR lo hi 1 (Intercept) -2.661 0.499 -5.328 0.000 0.07 0.01 2 group1 -1.300 0.742 -1.752 0.080 0.27 0.03 3 week2 -2.133 0.548 -3.890 0.000 0.12 0.02 4 week3 -1.376 0.532 -2.584 0.010 0.25 0.03 5 week4 -1.986 0.600 -3.311 0.001 0.14 0.02 6 group1:week2 -1.366 1.011 -1.351 0.177 0.26 0.03 7 group1:week3 -0.822 1.001 -0.821 0.411 0.44 0.06 8 group1:week4 -2.580 1.136 -2.271 0.023 0.08 0.01 `

this time the output of the `Anova(drugMod)`

is

`Analysis of Deviance Table (Type III Wald chisquare tests) Response: drugCrime Chisq Df Pr(>Chisq) (Intercept) 28.3890 1 9.923e-08 *** group 3.0684 1 0.0798276 . week 17.8728 3 0.0004672 *** group:week 5.4149 3 0.1438151 `

I understand why the regression outputs are different with the two different contrast coding schemes, but why am I getting two different `Anova()`

outputs for these two different coding schemes, and which is the 'correct' one to report? In a reply to this post @Ben Bolker quotes the `Anova()`

help, saying "Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense." But I am not sure what 'makes sense' means in this context.

#### Best Answer

You have to be a little careful with `car:Anova(..., type=3)`

since it only performs Type III tests if the sum-to-zero constrast coding (`contr.sum`

) is used and if there are no "missing cells" or other estimability issues (to borrow a term from linear model theory).

The contrasts that `car:Anova(..., type=3)`

produce is what I would call *marginal* contrasts and also what you get from `nlme::anova.lme(..., type="marginal")`

. In "nice" cases these contrasts correspond to Type III contrasts if `contr.sum`

is used.

My advice is therefore to:

Look at your data to check if there are any missing cells (use

`table(group, week)`

). Estimability issues often show up as`NA`

in summary coefficient tables if you use`lm`

but`glmer`

may not display them. From the looks of your model summary you are*probably*ok.Set the contrast-coding to

`contr.sum`

, fit the model, obtain the Type III tests:`options(contrasts = c("contr.sum", "contr.poly")) drugMod <- glmer(drugCrime ~ group*week + (1|id), ....) car::Anova(drugMod, type=3)`

It is possible to obtain the Type III tests using contrasts of Least-Square Means (or in this case their GLM generalisation). Russell Lenth has added extensive guides to his

**emmeans**package and this vignette describes how to obtain the type III tests in this way. I would compare these with the results from`car::Anova`

. Note that this approach will also not yield Type III if there are missing cells or estimability issues, but**emmeans**warns you if it detects such problems (and I think this is also implemented for`glmer`

fits).

This very recommendable vignette has a very insightful discussion of Type III contrasts. It termed the approach in `car:Anova(..., type=3)`

Not-Safe-Type-Three and it mentions that the instruction to use `contr.sum`

is in the `car`

-book (see footnote on page 15). It also has a description of which hypotheses are actually being tested if you use `contr.treatment`

or `contr.SAS`

and why these are probably not hypotheses that you are actually interested in (this addresses the "will not make sense" quote you mention).

### Edit in order to answer: "what is 'missing cells' and how to detect them?"

As an example of a real data set with missing cells, consider the soup data from the **ordinal** package. Note the 0-entry in the following tabulation:

`data("soup", package="ordinal") soup$RESPONSE <- as.numeric(as.character(soup$SURENESS)) with(soup, table(DAY, PRODID)) PRODID DAY 1 2 3 4 5 6 1 369 94 184 93 88 93 2 370 275 0 92 97 92 `

If we fit a model with the interaction between `PRODID`

and `DAY`

, one coefficient is not estimable which shows up as `NA`

:

`fm <- lm(RESPONSE ~ DAY * PRODID, data=soup) summary(fm) Call: lm(formula = RESPONSE ~ DAY * PRODID, data = soup) Residuals: Min 1Q Median 3Q Max -4.2043 -1.4486 0.8043 1.3818 2.5514 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.94580 0.09421 41.883 < 2e-16 *** DAY2 -0.49715 0.13314 -3.734 0.000194 *** PRODID2 0.67122 0.20908 3.210 0.001349 ** PRODID3 1.11398 0.16332 6.821 1.23e-11 *** PRODID4 0.59183 0.20998 2.819 0.004876 ** PRODID5 0.92920 0.21469 4.328 1.58e-05 *** PRODID6 1.25850 0.20998 5.993 2.47e-09 *** DAY2:PRODID2 0.49831 0.25392 1.962 0.049861 * DAY2:PRODID3 NA NA NA NA DAY2:PRODID4 0.51386 0.29756 1.727 0.084347 . DAY2:PRODID5 0.62215 0.29784 2.089 0.036854 * DAY2:PRODID6 0.48850 0.29756 1.642 0.100823 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.81 on 1836 degrees of freedom Multiple R-squared: 0.1022, Adjusted R-squared: 0.09735 F-statistic: 20.91 on 10 and 1836 DF, p-value: < 2.2e-16 `

In general any source of non-estimability is a problem but **R** goes a long way to eliminate such problems in the way design matrices are constructed. I can think of two relevant sources of non-estimability: missing cells and perfectly correlated covariates and most people already automatically avoid the latter.

If there were no missing cells the Type III test of `DAY`

would test the difference between days at an average `PRODID`

(i.e., a simple, flat, unweighted average). This is exactly the Yates contrast that Therneau discusses, but in the presence of missing cells this straight-forward interpretation no longer holds. You might want to stay clear of Type III tests in such cases due to lack of interpretability…

### Similar Posts:

- Solved – Why are results different when using aov_ez{afex} and Anova{car}, Type III SS in R
- Solved – Testing anova hypothesis with contrasts in R and SPSS
- Solved – How to deal with unbalanced group sizes in mixed design analysis
- Solved – Reproducing SPSS GLM in R, changed coefficients
- Solved – F-test differences Stata and R