To my surprise I found that standard errors and thus Wald confidence intervals became smaller when I removed the intercept from a simple logistic regression model, using
glm() and R.
# load an object named "my.df" to the global enviroment load(url("https://hansekbrand.se/code/test.RData")) # fit a model with intercept to data my.fit <- glm(deprived.of.education ~ religion, data = my.df, family = binomial("logit")) # fit a model without any intercept to data my.fit.without.intercept <- glm(deprived.of.education ~ 0 + religion, data = my.df, family = binomial("logit")) # inspect the first fit summary(my.fit)$coefficients # Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.8718056 0.03175130 -90.44687 0.000000e+00 # religionChristianity 0.4934891 0.03234887 15.25522 1.519805e-52 # religionHinduism 0.5257316 0.03376535 15.57015 1.161317e-54 # religionIslam 1.5734832 0.03231692 48.68914 0.000000e+00 # religionNonreligious 1.5975456 0.03555164 44.93592 0.000000e+00 # inspect the second fit summary(my.fit.without.intercept)$coefficients # Estimate Std. Error z value Pr(>|z|) # religionBuddhism -2.871806 0.031751299 -90.44687 0 # religionChristianity -2.378317 0.006189045 -384.27842 0 # religionHinduism -2.346074 0.011487113 -204.23530 0 # religionIslam -1.298322 0.006019850 -215.67354 0 # religionNonreligious -1.274260 0.015992939 -79.67642 0
I understand why the z values are different, because the null hypotheses in the two cases are different. In the first case, with the intercept, the null is "same as the reference category", while without the intercept, the null becomes "zero".
But I do not understand the large difference in standard errors between the two models.
Without the intercept, the standard errors seem to vary with n of each level, i.e. there are many cases of "Christianity" and "Islam", and they have small standard errors, but with the intercept, there is essentially no variation in the standard errors.
Could someone please explain the reason for the differences in the magnitude of the standard errors between the two models?
I would like to calculate probabilities and confidence intervals around them, and I have done so using the estimates from the first model. If I would do that with the estimates from the second model, the confidence intervals would be much smaller, but would they be reliable?
Your coefficients, even when they share common names, are not the same, i.e. their interpretation is different.
In the first model, the effect of
religionChristianity is a variation in the outcome wrt the baseline (
religionBuddhism), a relative variation. In the second model the effect of
religionChristianity is an absolute variation.
The effects are numerically equal, $-2.8718056+0.4934891=-2.378317+5e-07$, but in the first case the effect is a sum of two effects, i.e. you should compare the joint significance of
religionChristianity in the first model with the significance of
religionChristianity in the second one. You should compare a joint confidence interval (first model) with a simple one (second model).
The simple CI for
> confint(my.fit.without.intercept) Waiting for profiling to be done... 2.5 % 97.5 % ... religionChristianity -2.390467 -2.366207
There are several ways to compute joint intervals. Using arm:
> library(arm) > n.sims <- 1000 > sim.i <- sim(my.fit, n.sims) > intercept.plus.christianity <- [email protected][,1] + [email protected][,2] > quantile(intercept.plus.christianity, c(0.025, 0.975)) 2.5% 97.5% -2.390826 -2.366828
Can you see any significant (relevant) difference?
- Solved – std errors in poisson glm
- Solved – How to interpret coefficient standard errors for logistic regression
- Solved – Prediction Interval , Confidence Interval , Standard error
- Solved – Generalized method of moments versus standard least squares estimation
- Solved – How to interpret output from least trimmed squares estimate and compare it to OLS