# Solved – How to plot negative binomial data as a function of a categorical variable from a GLM

How do you correctly plot results from a GLM used to test a categorical variable? Here is a reproducible example in R (the data are listed below the code):

``# run the following code if you don't have the libraries installed:  #  install.packages("dplyr","ggplot2", "MASS")  # loading libraries library(dplyr) library(ggplot2) library(MASS)  # making fake data - including response (rs) from a negative binomial distribution  # and a two-level categorical variable (type) set.seed(246) df <- data.frame(rs   = c(rnegbin(n=100, mu=2, theta=1),                           rnegbin(n=30,  mu=3, theta=10)),                  type = c(rep("A", times=100), rep("B", times=30)))  # now doing the stats m1 <- glm.nb(rs~type, df) anova(m1) summary(m1)  # make summary table containing means, standard deviation (sd), # count (n), and standard error (se) for each Type (A and B) # going to use to graph results  df1 <- df %>%        group_by(type) %>%        summarise(means = mean(rs),                  sd    = sd(rs),                  n     = n(),                  se    = sd/sqrt(n)) df1  # now making the graph with df1 ggplot(df1, aes(x=type,y=means)) +   geom_bar(stat="identity", color="black",fill="grey") +   geom_errorbar(aes(ymax = means + se, ymin =  means - se), width = 0.2) +   labs(y ="Mean response",x= "Categorical variable") +   theme_bw() +   theme(axis.text = element_text(size=22),         axis.title = element_text(size=24))  # the generated data:  rs type   2    A   0    A   1    A   1    A   2    A   0    A   0    A   2    A   0    A   3    A   9    A   0    A  11    A   0    A   0    A   4    A   4    A   3    A   1    A   5    A   1    A   0    A   0    A   1    A   0    A   6    A  13    A   3    A   2    A   5    A   1    A   0    A   4    A   1    A   4    A   1    A   1    A   3    A   3    A   3    A   3    A   1    A   2    A   0    A   2    A   6    A   2    A   3    A   0    A   2    A   1    A   2    A   1    A   0    A   0    A   0    A   0    A   0    A   0    A   0    A   1    A   0    A   3    A   1    A   1    A   1    A   1    A   4    A   0    A   3    A   4    A   0    A   1    A   9    A   6    A   0    A   0    A   0    A   1    A   8    A   3    A   0    A   1    A   0    A   4    A   3    A   0    A   5    A   1    A   1    A   0    A   1    A   1    A  13    A   1    A   0    A   6    A   3    A   0    A   1    A   1    B   4    B   4    B   5    B   6    B   2    B   4    B   2    B   3    B   1    B   2    B   5    B   2    B   2    B   4    B   1    B   3    B   4    B   2    B   5    B   5    B   4    B   6    B   1    B   5    B   2    B   1    B   5    B   3    B   2    B `` The GLM says there is no statistical difference between Type A and Type B. I want to plot this result. I made a summary table that I use to plot the mean response for each Type and include error bars. I think I must be calculating standard error incorrectly. Do you have any advice for how to properly plot my results?

Contents

Let's calculate predicted values and confidence intervals from your model, with built-in functions

``pdat <- expand.grid(type = c(levels(df\$type))) pred <- predict(m1, newdata = pdat,  se.fit = T) cumlut <- 2 # could use 1.96 predframe <- data.frame(type = pdat,                     fit = pred\$fit,                    ci.lower = pred\$fit - cmult*pred\$se.fit,                     ci.upper = pred\$fit + cmult*pred\$se.fit) head(predframe) type      fit  ci.lower  ci.upper 1    A 0.756122 0.5439703 0.9682737 2    B 1.163151 0.8034228 1.5228788 ``

or by hand

``pdat <- expand.grid(type = c(levels(df\$type))) X <- model.matrix(formula(m1)[-2], pdat) fit <- X %*% coef(m1) pvar <- diag(X %*% vcov(m1) %*% t(X)) predframe <- data.frame(type = pdat,                         fit,                         ci.lower = fit - cmult*sqrt(pvar),                          ci.upper = fit + cmult*sqrt(pvar)) head(predframe) type      fit  ci.lower  ci.upper 1    A 0.756122 0.5439703 0.9682737 2    B 1.163151 0.8034228 1.5228788 ``

They are equivalent. Now, using your code:

``ggplot(predframe, aes(x=type,y=fit))+ geom_bar(stat="identity", color="black",fill="grey")+ geom_errorbar(aes(ymax = ci.upper, ymin =  ci.lower), width = 0.2)+ labs(y ="Mean response",x= "Categorical variable")+ theme_bw()+ theme(axis.text = element_text(size=22),     axis.title = element_text(size=24)) `` You should look at confidence intervals rather than standard error when you make any inference using your model. The differnce between this and your plot is that the standard errors you have calculated are from your dataset (raw data), while the one you should use to plot confidence intervals are the one you get from the model.

Rate this post