I'm trying to replicate a model fitted in SAS in R but the fit I'm getting gives me slightly different coefficients and standard errors.
Data:
testdata <- data.frame(matrix(c("f","Test", 1.75, 16, 0, 16, 0, 1, 1, "m", "Test", 1.75, 15, 1, 16, 6.25, 1, 0, "f", "Test", 2.75, 4, 12, 16, 75, 1, 1, "m", "Test", 2.75, 9, 6, 15, 40, 1, 0, "f", "WHO", 1.75, 15, 1, 16, 6.25, 0, 1, "m", "WHO", 1.75, 14, 2, 16, 12.5, 0, 0, "f", "WHO", 2.75, 2, 13, 15, 86.6667, 0, 1, "m", "WHO", 2.75, 3, 13, 16, 81.25, 0, 0 ), ncol=9, byrow=TRUE)) names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", "vacnum", "sexno")
SAS:
proc genmod data=model_data; class sex; model para/n = dose sex vacnum /dist=bin link=logit type3; run; Analysis Of Maximum Likelihood Parameter Estimates Parameter DF Estimate Std Error Wald 95% Conf Lim Wald Chi-Square Pr > ChiSq Intercept 1 -9.4020 1.6220 -12.5810 -6.2230 33.60 <.0001 dose 1 3.9208 0.6460 2.6546 5.1870 36.83 <.0001 sex f 1 0.5574 0.5184 -0.4587 1.5735 1.16 0.2823 sex m 0 0.0000 0.0000 0.0000 0.0000 . . vacnum 1 -1.3221 0.5483 -2.3967 -0.2475 5.81 0.0159 Scale 0 1.0000 0.0000 1.0000 1.0000
R:
testdata$sexno <- as.factor(testdata$sexno) a <- contr.treatment(2, base = 1, contrasts = TRUE) contrasts(testdata$sexno) <- a fitreduced <- glm(para/n ~ dose + as.factor(sex) + vacnum, family=quasibinomial(link="logit"), data=testdata) coef(summary(fitreduced)) Estimate Std. Error t value Pr(>|t|) (Intercept) -9.4013750 1.7613982 -5.337450 0.005935450 dose 3.9173794 0.7001133 5.595351 0.005007179 as.factor(sex)1 0.5704671 0.5568436 1.024466 0.363525300 vacnum -1.3336100 0.5887552 -2.265135 0.086189704
I believe I have the right contrasts to give me a type III SS but there is a small discrepency in values, have a missed something here?
Best Answer
I notice several things here.
First, when you enter your data via matrix
, all the data have to be the same type. Thus, they are coerced to be the most inclusive type, strings, which in turn are coerced to be factors by default. Note:
testdata <- data.frame(matrix(c("f","Test", 1.75, 16, 0, 16, 0, 1, 1, ... sapply(testdata, class) # sex vaccine dose not_p para n pct vacnum sexno # "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"
Try using read.table(text='...', sep=",")
instead:
testdata <- read.table(text='"f", "Test", 1.75, 16, 0, 16, 0, 1, 1 "m", "Test", 1.75, 15, 1, 16, 6.25, 1, 0 "f", "Test", 2.75, 4, 12, 16, 75, 1, 1 "m", "Test", 2.75, 9, 6, 15, 40, 1, 0 "f", "WHO", 1.75, 15, 1, 16, 6.25, 0, 1 "m", "WHO", 1.75, 14, 2, 16, 12.5, 0, 0 "f", "WHO", 2.75, 2, 13, 15, 86.6667, 0, 1 "m", "WHO", 2.75, 3, 13, 16, 81.25, 0, 0', sep=",") names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", "vacnum", "sexno") sapply(testdata, class) # sex vaccine dose not_p para n pct vacnum # "factor" "factor" "numeric" "integer" "integer" "integer" "numeric" "integer" # sexno # "integer"
(That was small potatoes.) The next trap to worry about is that SAS and R code logistic regression for binomial data differently. SAS uses "events over trials", but R uses the odds, successes/failures. Thus, your model formula should be:
form <- as.formula("cbind(para, n-para) ~ dose + sex + vacnum")
Finally, you specified family=quasibinomial
(i.e., the quasibinomial) in your R code, but DIST=BIN
(i.e., the binomial) in your SAS code. To match the SAS output, use the binomial instead. Thus, your final model is:
fitreduced <- glm(form, family=binomial(link="logit"), data=testdata) coef(summary(fitreduced)) # Estimate Std. Error z value Pr(>|z|) # (Intercept) -9.4020028 1.6219570 -5.796703 6.763131e-09 # dose 3.9207805 0.6460193 6.069138 1.285986e-09 # sexf 0.5574087 0.5184112 1.075225 2.822741e-01 # vacnum -1.3221011 0.5482645 -2.411430 1.589012e-02
This seems to match the SAS estimates and standard errors.
Similar Posts:
- Solved – Difference in output between SAS’s proc genmod and R’s glm
- Solved – Difference in output between SAS’s proc genmod and R’s glm
- Solved – GLM: binomial(logit) with weights=tested
- Solved – Calculate the intercept and coefficient in Logistic Regression by hand (manually)
- Solved – Calculate the intercept and coefficient in Logistic Regression by hand (manually)