Solved – Logistic regression: what is the link between the binomial family and the binomial distribution

When I want to estimate a logistic regression in R, I type

lm <- glm(y ~ x, data=df, family=binomial(link="logit")) 

I wonder what is the relationship between the "binomial family" and the binomial distribution.

Logistic regression in effect estimates the probability of an event happening based of the values of the independent variables, or more precisely the log-odds, a monotonic function of the probability.

Your observations are essentially cases of the event happening or not: if there is one possibility being observed for that combination of values of the independent variables then it has a Bernouilli distribution of happening or not; and if several possibilities are being observed for that combination then the number occurring has a binomial distribution (typically when the independent variables are discrete rather than continuous). The Bernouilli distribution is just a special case of the binomial distribution.

So logistic regression estimates the probabilities which would maximise the likelihood of the observations using a binomial model, subject to the constraints of the regression. This is what links logistic regression and binomial distributions. You do not know the parameters of the binomial distributions, so they are the binomial family.

glm in R allows other generalised linear models apart from logistic regression. See https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html for details on other "families" that can be used for other models, such as gaussian or poisson or Gamma, all of which involve single types of distribution but with unknown parameters so each making up a family.

Similar Posts:

Rate this post

Leave a Comment