Solved – How to plot negative binomial data as a function of a categorical variable from a GLM

How do you correctly plot results from a GLM used to test a categorical variable? Here is a reproducible example in R (the data are listed below the code):

# run the following code if you don't have the libraries installed:  #  install.packages("dplyr","ggplot2", "MASS")  # loading libraries library(dplyr) library(ggplot2) library(MASS)  # making fake data - including response (rs) from a negative binomial distribution  # and a two-level categorical variable (type) set.seed(246) df <- data.frame(rs   = c(rnegbin(n=100, mu=2, theta=1),                           rnegbin(n=30,  mu=3, theta=10)),                  type = c(rep("A", times=100), rep("B", times=30)))  # now doing the stats m1 <- glm.nb(rs~type, df) anova(m1) summary(m1)  # make summary table containing means, standard deviation (sd), # count (n), and standard error (se) for each Type (A and B) # going to use to graph results  df1 <- df %>%        group_by(type) %>%        summarise(means = mean(rs),                  sd    = sd(rs),                  n     = n(),                  se    = sd/sqrt(n)) df1  # now making the graph with df1 ggplot(df1, aes(x=type,y=means)) +   geom_bar(stat="identity", color="black",fill="grey") +   geom_errorbar(aes(ymax = means + se, ymin =  means - se), width = 0.2) +   labs(y ="Mean response",x= "Categorical variable") +   theme_bw() +   theme(axis.text = element_text(size=22),         axis.title = element_text(size=24))  # the generated data:  rs type   2    A   0    A   1    A   1    A   2    A   0    A   0    A   2    A   0    A   3    A   9    A   0    A  11    A   0    A   0    A   4    A   4    A   3    A   1    A   5    A   1    A   0    A   0    A   1    A   0    A   6    A  13    A   3    A   2    A   5    A   1    A   0    A   4    A   1    A   4    A   1    A   1    A   3    A   3    A   3    A   3    A   1    A   2    A   0    A   2    A   6    A   2    A   3    A   0    A   2    A   1    A   2    A   1    A   0    A   0    A   0    A   0    A   0    A   0    A   0    A   1    A   0    A   3    A   1    A   1    A   1    A   1    A   4    A   0    A   3    A   4    A   0    A   1    A   9    A   6    A   0    A   0    A   0    A   1    A   8    A   3    A   0    A   1    A   0    A   4    A   3    A   0    A   5    A   1    A   1    A   0    A   1    A   1    A  13    A   1    A   0    A   6    A   3    A   0    A   1    A   1    B   4    B   4    B   5    B   6    B   2    B   4    B   2    B   3    B   1    B   2    B   5    B   2    B   2    B   4    B   1    B   3    B   4    B   2    B   5    B   5    B   4    B   6    B   1    B   5    B   2    B   1    B   5    B   3    B   2    B 

enter image description here

The GLM says there is no statistical difference between Type A and Type B. I want to plot this result. I made a summary table that I use to plot the mean response for each Type and include error bars. I think I must be calculating standard error incorrectly. Do you have any advice for how to properly plot my results?

Let's calculate predicted values and confidence intervals from your model, with built-in functions

pdat <- expand.grid(type = c(levels(df$type))) pred <- predict(m1, newdata = pdat,  se.fit = T) cumlut <- 2 # could use 1.96 predframe <- data.frame(type = pdat,                     fit = pred$fit,                    ci.lower = pred$fit - cmult*pred$se.fit,                     ci.upper = pred$fit + cmult*pred$se.fit) head(predframe) type      fit  ci.lower  ci.upper 1    A 0.756122 0.5439703 0.9682737 2    B 1.163151 0.8034228 1.5228788 

or by hand

pdat <- expand.grid(type = c(levels(df$type))) X <- model.matrix(formula(m1)[-2], pdat) fit <- X %*% coef(m1) pvar <- diag(X %*% vcov(m1) %*% t(X)) predframe <- data.frame(type = pdat,                         fit,                         ci.lower = fit - cmult*sqrt(pvar),                          ci.upper = fit + cmult*sqrt(pvar)) head(predframe) type      fit  ci.lower  ci.upper 1    A 0.756122 0.5439703 0.9682737 2    B 1.163151 0.8034228 1.5228788 

They are equivalent. Now, using your code:

ggplot(predframe, aes(x=type,y=fit))+ geom_bar(stat="identity", color="black",fill="grey")+ geom_errorbar(aes(ymax = ci.upper, ymin =  ci.lower), width = 0.2)+ labs(y ="Mean response",x= "Categorical variable")+ theme_bw()+ theme(axis.text = element_text(size=22),     axis.title = element_text(size=24)) 

enter image description here

You should look at confidence intervals rather than standard error when you make any inference using your model. The differnce between this and your plot is that the standard errors you have calculated are from your dataset (raw data), while the one you should use to plot confidence intervals are the one you get from the model.

Similar Posts:

Rate this post

Leave a Comment