# Solved – Level of factor taken as intercept

I am using a GLM to analyze binomial data from one factor (Group) with three levels: Control, Control Treatment and Treatment.

``m3 <- glm(Survive ~ Group, family=binomial, data=dat2) summary(m3) ``

when analyzing however, the model has taken Control as intercept, I'm not sure why this is. Also in previous analyses with GLMs I have never seen levels of a factor presented separately in the summary:

``Call: glm(formula = Survive ~ Group, family = binomial, data = dat2)  Deviance Residuals:     Min      1Q  Median      3Q     Max   -1.354  -1.177   0.000   1.177   1.354    Coefficients:                Estimate Std. Error z value Pr(>|z|) (Intercept)      0.4055     0.6455   0.628    0.530 GroupCtrl Trt   -0.4055     0.9037  -0.449    0.654 GroupTreatment  -0.8109     0.9129  -0.888    0.374  (Dispersion parameter for binomial family taken to be 1)      Null deviance: 41.589  on 29  degrees of freedom Residual deviance: 40.783  on 27  degrees of freedom AIC: 46.783  Number of Fisher Scoring iterations: 4 ``

Edit 1:
Normally in the summary I would see an intercept and then a factor, the separate levels I would only see in a post-hoc multiple comparison. My data collection consists of two collumns, one is treatment (Ctrl,Ctrl Trt, Treatment) the other is binary data: 1 for survival and 0 for loss.

NEST = nest id, not used in this analysis.
`> str(dat2) 'data.frame': 30 obs. of 3 variables: \$ NEST : num 3 6 9 12 15 18 21 24 27 30 ... \$ Group : Factor w/ 3 levels "Control","Ctrl Trt",..: 1 1 1 1 1 1 1 1 1 1 ... \$ Survive: num 1 1 0 0 0 1 0 1 1 1 ...`
I do not desire to omit the intercept, I'm confused as well as how this could happen.

Edit 2: adding + 0 to the model

``Call: glm(formula = Survive ~ Group + 0, family = binomial, data = dat2)  Deviance Residuals:     Min      1Q  Median      3Q     Max   -1.354  -1.177   0.000   1.177   1.354    Coefficients:                Estimate Std. Error z value Pr(>|z|) GroupControl     0.4055     0.6455   0.628     0.53 GroupCtrl Trt    0.0000     0.6325   0.000     1.00 GroupTreatment  -0.4055     0.6455  -0.628     0.53  (Dispersion parameter for binomial family taken to be 1)      Null deviance: 41.589  on 30  degrees of freedom Residual deviance: 40.783  on 27  degrees of freedom AIC: 46.783  Number of Fisher Scoring iterations: 4 ``

Edit 3: The 30 nests were observed in two series, I'd like to add this as factor

``'data.frame':   30 obs. of  4 variables:  \$ NEST   : Factor w/ 30 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...  \$ Group  : Factor w/ 3 levels "Control","Ctrl Trt",..: 3 2 1 3 2 1 3 2 1 3 ...  \$ Survive: num  1 1 1 0 1 1 0 1 0 0 ...  \$ Series : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... ``

However, when adding this factor to the model, including the + 0, I get confusing results again, it doesn't include Series1 for instance:

``Call: glm(formula = Survive ~ Group * Series + 0, family = binomial,      data = dat1)  Deviance Residuals:      Min       1Q   Median       3Q      Max   -1.7941  -0.6681   0.0000   0.6681   1.7941    Coefficients:                          Estimate Std. Error z value Pr(>|z|)   GroupControl           -4.055e-01  9.129e-01  -0.444    0.657   GroupCtrl Trt           1.386e+00  1.118e+00   1.240    0.215   GroupTreatment         -1.386e+00  1.118e+00  -1.240    0.215   Series2                 1.792e+00  1.443e+00   1.241    0.214   GroupCtrl Trt:Series2  -4.564e+00  2.141e+00  -2.132    0.033 * GroupTreatment:Series2  1.133e-15  2.041e+00   0.000    1.000   --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  (Dispersion parameter for binomial family taken to be 1)      Null deviance: 41.589  on 30  degrees of freedom Residual deviance: 33.476  on 24  degrees of freedom AIC: 45.476  Number of Fisher Scoring iterations: 4 ``
Contents

That's how "treatment contrasts" work. One column of the model matrix is taken by the "first" factor in that simple model. Each statistical system chooses a default contrast strategy, so R's is different that SAS or SPSS. If the model were more complex with multiple factor predictors, then the "Intercept" would apply to the cases who all had the base-level of the various factors. If there were continuous covariates then the intercept would be the predicted "effect" for a hypothetical case with all factors at the base level and all continuous predictors at zero. (Obviously this might not be a physically interpretable scenario.) You could in this instance use a different formula to get the labeling as you expected with:

``glm(formula = Survive ~ Group + 0, family = binomial, data = dat2) ``

This is my attempt to reconstruct the results of that call:

``Call: glm(formula = Survive ~ Group + 0, family = binomial, data = dat2)  Deviance Residuals: Min    1Q   Median       3Q   Max                  -1.354 -1.177   0.000    1.177 1.354  Coefficients: Estimate Std. Error z value Pr(>|z|)  GroupControl    0.4055     0.6455   0.628     0.53  GroupCtrl Trt   0.0000     0.6325   0.000     1.00  GroupTreatment -0.4055     0.6455  -0.628     0.53  (Dispersion parameter for binomial family taken to be 1) ``

So that shows that the coefficient for the "Ctrl Trt"-Group was zero so that further implies an exactly 50% survival in that group. When you omit the Intercept in a single factor model, each of coefficients refer only to the log-odds for the individual factor levels. The "Treatment"-Group coefficient suggest that of a group of 10 subject that 4 out of 10 survived since `exp(-.4055)  0.6666434` is very close to 4/(10-4). And in your "Ctrl"-Group there was 6 out of ten survivors since `exp(0.4055)  1.500052` is very close to 6/(10-6). (Remembering that we are modeling odds, not probabilities.)

In general, it's better (as in less confusing to the "uninitiated") to not omit the intercept, but for a single factor model it can be helpful.

I'm actually having difficulty figuring out how you could have produced that particular result (two levels that have values whose absolute values are exactly equal to one-half of the value of the third level). I'm wondering if you have somehow duplicated cases? You should a) describe the data collection and b) post the output of `str(dat2)`.

Rate this post