I have run various logistic regressions (GLM) from the binomial family and they have produced some very interesting results. I would now like to run post-hoc tests to find out which levels of the explanatory variables are significant, but I am finding it very difficult to find a post-hoc test that is compatible with my data.
I am looking at a population of skeletons. I have recorded the presence/absence of infectious lesions at the level of the individual. I have a response variable (Periostitis, which is a type of lesion and which has two levels: present and absent) and I have four explanatory variables (Age.Group, Sex, Cemetery, and Time.Period). These represent the age of the individual (A, B, C, and D), the sex of the individual (Male or Female), the cemetery the individual was recovered from (Cem1, Cem2, and Cem3), and the time period the individual is dated to (TP1 and TP2), respectivly.
I have run logistic regressions because my data is binomial. An example of which can be seen here:
Call: glm(formula = Periosna$Periostitis ~ Periosna$Cemetery, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.9103 0.5931 0.6841 0.6841 0.9537 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.3332 0.2087 6.387 1.69e-10 *** Periosna$CemeteryCem2 0.3155 0.5311 0.594 0.5525 Periosna$CemeteryCem3 -0.7811 0.3557 -2.196 0.0281 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 244.03 on 221 degrees of freedom Residual deviance: 238.04 on 219 degrees of freedom AIC: 244.04 Number of Fisher Scoring iterations: 4
I now want to run a post-hoc test to see which levels (Cem1, Cem2, or Cem3) are significant. I have read everything on this site and on others and have tried to run the glht function from the multcomp package. I see that there are different types of tests (matrix, character, expression, mcp, and mlf), but I do not know which to use. There is a lot of literature online about mcp, but I do not think that I can use this as my data is binomial. When I try to use it, I get an error code:
> library(multcomp) > > results<-glht(modelA,linfct=mcp(Cemetery="Tukey")) Error in mcp2matrix(model, linfct = linfct) : Variable(s) ‘Cemetery’ have been specified in ‘linfct’ but cannot be found in ‘model’!
What would you suggest I use to analyze the levels of my explanatory variables?
Thank you very much for your help. I am stumped!
Best Answer
The problem you haveg is that are abusing (misusing) the formula notation. You never use a formula like this:
Periosna$Periostitis ~ Periosna$Cemetery
because, as you've found out it, it totally breaks the logic that you want to then use for subsequent operations.
You asked mcp()
to set up Tukey contrasts for the covariate Cemetery
, yet you fitted a model with covariate name Periosna$Cemetery
, hence it quite rightly told you that a variable of the stated name was not involved in the model fit.
Instead, the call should have been
glm(Periostitis ~ Cemetery, data = Periosna, family = binomial)
noting that we use the variable names directly, as they exist in Periosna
, and we tell R where to locate the variables in the formula via the data
argument.
Now when you call glht()
it should be able to find the Cemetery
variable.