I am using a GLM to analyze binomial data from one factor (Group) with three levels: Control, Control Treatment and Treatment.
m3 <- glm(Survive ~ Group, family=binomial, data=dat2) summary(m3)
when analyzing however, the model has taken Control as intercept, I'm not sure why this is. Also in previous analyses with GLMs I have never seen levels of a factor presented separately in the summary:
Call: glm(formula = Survive ~ Group, family = binomial, data = dat2) Deviance Residuals: Min 1Q Median 3Q Max -1.354 -1.177 0.000 1.177 1.354 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.4055 0.6455 0.628 0.530 GroupCtrl Trt -0.4055 0.9037 -0.449 0.654 GroupTreatment -0.8109 0.9129 -0.888 0.374 (Dispersion parameter for binomial family taken to be 1) Null deviance: 41.589 on 29 degrees of freedom Residual deviance: 40.783 on 27 degrees of freedom AIC: 46.783 Number of Fisher Scoring iterations: 4
Edit 1:
Normally in the summary I would see an intercept and then a factor, the separate levels I would only see in a post-hoc multiple comparison. My data collection consists of two collumns, one is treatment (Ctrl,Ctrl Trt, Treatment) the other is binary data: 1 for survival and 0 for loss.
NEST = nest id, not used in this analysis.
> str(dat2)
'data.frame': 30 obs. of 3 variables:
$ NEST : num 3 6 9 12 15 18 21 24 27 30 ...
$ Group : Factor w/ 3 levels "Control","Ctrl Trt",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Survive: num 1 1 0 0 0 1 0 1 1 1 ...
I do not desire to omit the intercept, I'm confused as well as how this could happen.
Edit 2: adding + 0 to the model
Call: glm(formula = Survive ~ Group + 0, family = binomial, data = dat2) Deviance Residuals: Min 1Q Median 3Q Max -1.354 -1.177 0.000 1.177 1.354 Coefficients: Estimate Std. Error z value Pr(>|z|) GroupControl 0.4055 0.6455 0.628 0.53 GroupCtrl Trt 0.0000 0.6325 0.000 1.00 GroupTreatment -0.4055 0.6455 -0.628 0.53 (Dispersion parameter for binomial family taken to be 1) Null deviance: 41.589 on 30 degrees of freedom Residual deviance: 40.783 on 27 degrees of freedom AIC: 46.783 Number of Fisher Scoring iterations: 4
Edit 3: The 30 nests were observed in two series, I'd like to add this as factor
'data.frame': 30 obs. of 4 variables: $ NEST : Factor w/ 30 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ... $ Group : Factor w/ 3 levels "Control","Ctrl Trt",..: 3 2 1 3 2 1 3 2 1 3 ... $ Survive: num 1 1 1 0 1 1 0 1 0 0 ... $ Series : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
However, when adding this factor to the model, including the + 0, I get confusing results again, it doesn't include Series1 for instance:
Call: glm(formula = Survive ~ Group * Series + 0, family = binomial, data = dat1) Deviance Residuals: Min 1Q Median 3Q Max -1.7941 -0.6681 0.0000 0.6681 1.7941 Coefficients: Estimate Std. Error z value Pr(>|z|) GroupControl -4.055e-01 9.129e-01 -0.444 0.657 GroupCtrl Trt 1.386e+00 1.118e+00 1.240 0.215 GroupTreatment -1.386e+00 1.118e+00 -1.240 0.215 Series2 1.792e+00 1.443e+00 1.241 0.214 GroupCtrl Trt:Series2 -4.564e+00 2.141e+00 -2.132 0.033 * GroupTreatment:Series2 1.133e-15 2.041e+00 0.000 1.000 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 41.589 on 30 degrees of freedom Residual deviance: 33.476 on 24 degrees of freedom AIC: 45.476 Number of Fisher Scoring iterations: 4
Best Answer
That's how "treatment contrasts" work. One column of the model matrix is taken by the "first" factor in that simple model. Each statistical system chooses a default contrast strategy, so R's is different that SAS or SPSS. If the model were more complex with multiple factor predictors, then the "Intercept" would apply to the cases who all had the base-level of the various factors. If there were continuous covariates then the intercept would be the predicted "effect" for a hypothetical case with all factors at the base level and all continuous predictors at zero. (Obviously this might not be a physically interpretable scenario.) You could in this instance use a different formula to get the labeling as you expected with:
glm(formula = Survive ~ Group + 0, family = binomial, data = dat2)
This is my attempt to reconstruct the results of that call:
Call: glm(formula = Survive ~ Group + 0, family = binomial, data = dat2) Deviance Residuals: Min 1Q Median 3Q Max -1.354 -1.177 0.000 1.177 1.354 Coefficients: Estimate Std. Error z value Pr(>|z|) GroupControl 0.4055 0.6455 0.628 0.53 GroupCtrl Trt 0.0000 0.6325 0.000 1.00 GroupTreatment -0.4055 0.6455 -0.628 0.53 (Dispersion parameter for binomial family taken to be 1)
So that shows that the coefficient for the "Ctrl Trt"-Group was zero so that further implies an exactly 50% survival in that group. When you omit the Intercept in a single factor model, each of coefficients refer only to the log-odds for the individual factor levels. The "Treatment"-Group coefficient suggest that of a group of 10 subject that 4 out of 10 survived since exp(-.4055) [1] 0.6666434
is very close to 4/(10-4). And in your "Ctrl"-Group there was 6 out of ten survivors since exp(0.4055) [1] 1.500052
is very close to 6/(10-6). (Remembering that we are modeling odds, not probabilities.)
In general, it's better (as in less confusing to the "uninitiated") to not omit the intercept, but for a single factor model it can be helpful.
I'm actually having difficulty figuring out how you could have produced that particular result (two levels that have values whose absolute values are exactly equal to one-half of the value of the third level). I'm wondering if you have somehow duplicated cases? You should a) describe the data collection and b) post the output of str(dat2)
.
Similar Posts:
- Solved – Which intercept R selects (binomial glm)
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method