# Solved – Why would they pick a gamma distribution here

In one of the exercises for my course, we're using a Kaggle medical dataset.

The exercise says:

we want to model the distribution of individual charges and we also really want to be able to capture our uncertainty about that distribution so we can better capture the range of values we might see. Loading the data and performing an initial view:

We may suspect from the above that there is some sort of exponential-like distribution at play here. …The insurance claim charges may possibly be multimodal. The gamma distribution may be applicable and we could test this for the distribution of charges that weren't insurance claims first.

I looked up "Gamma distribution" and found "a continuous, positive-only, unimodal distribution that encodes the time required for «alpha» events to occur in a Poisson process with mean arrival time of «beta»"

There's no time involved here, just unrelated charges, either insured or not.

Why would they choose a gamma distribution?

Contents

• As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $$n$$ independent events with a constant waiting time $$lambda$$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.