Solved – Difference in output between SAS’s proc genmod and R’s glm

I'm trying to replicate a model fitted in SAS in R but the fit I'm getting gives me slightly different coefficients and standard errors.

Data:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1, "m",    "Test", 1.75,   15, 1,  16, 6.25,   1,  0, "f",    "Test", 2.75,   4,  12, 16, 75, 1,  1, "m",    "Test", 2.75,   9,  6,  15, 40, 1,  0, "f",    "WHO",  1.75,   15, 1,  16, 6.25,   0,  1, "m",    "WHO",  1.75,   14, 2,  16, 12.5,   0,  0, "f",    "WHO",  2.75,   2,  13, 15, 86.6667,    0,  1, "m",    "WHO",  2.75,   3,  13, 16, 81.25,  0,  0 ), ncol=9, byrow=TRUE)) names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct",                       "vacnum", "sexno") 

SAS:

proc genmod data=model_data; class sex; model para/n  = dose sex vacnum    /dist=bin    link=logit   type3; run;  Analysis Of Maximum Likelihood Parameter Estimates  Parameter   DF Estimate Std Error Wald 95% Conf Lim Wald Chi-Square Pr > ChiSq  Intercept   1 -9.4020   1.6220    -12.5810  -6.2230  33.60           <.0001  dose        1  3.9208   0.6460      2.6546   5.1870  36.83           <.0001  sex f       1  0.5574   0.5184     -0.4587   1.5735   1.16           0.2823  sex m       0  0.0000   0.0000      0.0000   0.0000    .              .  vacnum      1 -1.3221   0.5483     -2.3967  -0.2475   5.81           0.0159  Scale       0  1.0000   0.0000      1.0000   1.0000      

R:

testdata$sexno <- as.factor(testdata$sexno)     a <- contr.treatment(2, base = 1, contrasts = TRUE)  contrasts(testdata$sexno) <- a  fitreduced <- glm(para/n ~ dose + as.factor(sex) + vacnum,                    family=quasibinomial(link="logit"), data=testdata)  coef(summary(fitreduced))                    Estimate Std. Error   t value    Pr(>|t|) (Intercept)     -9.4013750  1.7613982 -5.337450 0.005935450 dose             3.9173794  0.7001133  5.595351 0.005007179 as.factor(sex)1  0.5704671  0.5568436  1.024466 0.363525300 vacnum          -1.3336100  0.5887552 -2.265135 0.086189704 

I believe I have the right contrasts to give me a type III SS but there is a small discrepency in values, have a missed something here?

I notice several things here.

First, when you enter your data via matrix, all the data have to be the same type. Thus, they are coerced to be the most inclusive type, strings, which in turn are coerced to be factors by default. Note:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1, ... sapply(testdata, class) #      sex  vaccine     dose    not_p     para        n      pct   vacnum    sexno # "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" 

Try using read.table(text='...', sep=",") instead:

testdata <- read.table(text='"f", "Test", 1.75,   16,   0,  16,  0,      1,  1 "m", "Test", 1.75,   15,   1,  16,  6.25,   1,  0 "f", "Test", 2.75,    4,  12,  16, 75,      1,  1 "m", "Test", 2.75,    9,   6,  15, 40,      1,  0 "f", "WHO",  1.75,   15,   1,  16,  6.25,   0,  1 "m", "WHO",  1.75,   14,   2,  16, 12.5,    0,  0 "f", "WHO",  2.75,    2,  13,  15, 86.6667, 0,  1 "m", "WHO",  2.75,    3,  13,  16, 81.25,   0,  0', sep=",") names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct",                       "vacnum", "sexno") sapply(testdata, class) #      sex   vaccine      dose     not_p      para         n       pct    vacnum  # "factor"  "factor" "numeric" "integer" "integer" "integer" "numeric" "integer"  #     sexno  # "integer"  

(That was small potatoes.) The next trap to worry about is that SAS and R code logistic regression for binomial data differently. SAS uses "events over trials", but R uses the odds, successes/failures. Thus, your model formula should be:

form <- as.formula("cbind(para, n-para) ~ dose + sex + vacnum") 

Finally, you specified family=quasibinomial (i.e., the quasibinomial) in your R code, but DIST=BIN (i.e., the binomial) in your SAS code. To match the SAS output, use the binomial instead. Thus, your final model is:

fitreduced <- glm(form, family=binomial(link="logit"), data=testdata) coef(summary(fitreduced)) #               Estimate Std. Error   z value     Pr(>|z|) # (Intercept) -9.4020028  1.6219570 -5.796703 6.763131e-09 # dose         3.9207805  0.6460193  6.069138 1.285986e-09 # sexf         0.5574087  0.5184112  1.075225 2.822741e-01 # vacnum      -1.3221011  0.5482645 -2.411430 1.589012e-02 

This seems to match the SAS estimates and standard errors.

Similar Posts:

Rate this post

Leave a Comment