Solved – Why is there a need for a ‘sampling distribution’ to find confidence intervals

I understand the key principles behind confidence intervals, but there's something I want a bit of clarification on. Let's say I have a basket of apples that I picked at the orchard. The weight is 110g with some standard deviation. Now, let's say the population mean (which I understand we wouldn't know and it's what we're trying to estimate) is 100g. We are constructing a sampling distribution of the sample mean based on the 110g. Because of the Central Limit Theorem, the mean of the sampling distribution would also be 110g.

However, because the population mean is 100g, the sampling distribution of the sample mean will have a mean of 100g. So the only reason why we create the theoretical sampling distribution is so we can "capture" the population parameter 95% of the time?

Contents

1. Goal of using confidence intervals

As you correctly stated, the rationale behind confidence intervals is to get an idea about the value of some unknown parameter, in your case the 'mean' weight of a basket of apples. One way to find out is to weight each and every existing basket, and then compute the average.

Obviously, when there are a huge amount of baskets to weigh, this can be very time consuming and expensive. Therefore, in many practical cases we would like to measure a 'limited number' of baskets and from this so-called sample, we want to get an 'idea' about the mean weight of all baskets, i.e. we want to use the weight of the baskets in our 'limited' sample to get an estimate for the mean weight over all the baskets.

Obviously, as we only 'estimate' this overall (unknown) mean, we will make an estimation error, and we do not only want to have an estimate for the overall mean, but also we want to have an idea about the 'estimation error'. This is where the notion of a confidence interval comes in, by 'expressing' our estimate as an interval, we also have an idea about its precision.

2. How can we find such an estimate and its (im)precision ? – the sampling distribution of the sample average

For the problem of finding confidence intervals for the mean, the basic theorem to derive confidence intervals is – as you said – the central limit theorem (CLT). This CLT states that, under fairly general conditions, the arithmetic average of a sum of random variables has has a normal distribution. Let us simplify a bit and assume that your apple baskets have a distribution with unknown mean and known standard deviation (if the standard deviation is unknown the idea is the same but there are some minor differences because of the fact that the standard deviation will have to be estimated from your sample – see at the end of this section). If I note the weight of your basket as \$W\$, then this sentence is denoted as \$W sim N(mu, sigma)\$.

Now you will take a sample of \$n\$ baskets, each with a weight \$w_i, i= 1 dots n\$.

The CLT says that, if we 'pick' randomly (independently) \$n\$ baskets out of the set of all the baskets that the arithmetic average of the weights in the sample \$bar{w}=frac{1}{n} sum_{i=1}^n w_i\$ is a (draw from) a random variable that converges (as \$n to +infty\$) in distribution to a random variable :

1. has a normal distribution;
2. has a mean equal to the overall mean of all the baskets;
3. has a variance equal to the variance of all the baskets, divided by the square root of the sample size

So \$bar{w}\$ is a (draw) from \$N(mu, frac{sigma}{sqrt{n}})\$.

(Note: this is valid, even if the random variable \$W\$ has another than a normal distribution).

\$N(mu, frac{sigma}{sqrt{n}})\$ is called the sampling distribution of the sample average, so it is the distribution of the averages of all the samples of size \$n\$.

We do we have now:

1. we have the weight \$w_i\$ of the \$n\$ baskets in a sample of \$n\$ baskets and we can compute \$bar{w}\$
2. from the CLT we know that the probabilities of 'observing' a value \$bar{w}\$ for our (randomly drawn) sample can be computed using the density of the normal distribution.

E.g. It is well known that, as the sampling distribution of the sample average is normal, the probability that \$bar{w}\$ lies between \$[mu-1.96frac{sigma}{sqrt{n}};mu+1.96frac{sigma}{sqrt{n}}]\$ is \$95%\$.

Let's us analyse this in detail: \$bar{w} in [mu-1.96frac{sigma}{sqrt{n}};mu+1.96frac{sigma}{sqrt{n}}]\$ is equivalent to saying that \$bar{w} ge mu-1.96frac{sigma}{sqrt{n}}\$ and \$bar{w} le mu+1.96frac{sigma}{sqrt{n}}\$.

The first inequality can be re-written: \$bar{w} ge mu-1.96frac{sigma}{sqrt{n}} iff bar{w} + 1.96frac{sigma}{sqrt{n}} ge mu\$ and the second inequality as \$bar{w} le mu+1.96frac{sigma}{sqrt{n}} iff bar{w} – 1.96frac{sigma}{sqrt{n}} le mu\$

Therefore \$bar{w} in [mu-1.96frac{sigma}{sqrt{n}};mu+1.96frac{sigma}{sqrt{n}}]\$ is equivalent with \$mu in [bar{w}-1.96frac{sigma}{sqrt{n}};bar{w}+1.96frac{sigma}{sqrt{n}}]\$ and obvioulsy, in that case it must hold that the probability that \$bar{w} in [mu-1.96frac{sigma}{sqrt{n}};mu+1.96frac{sigma}{sqrt{n}}]\$ (which was \$95%\$, see supra) is equal to the probability that \$mu in [bar{w}-1.96frac{sigma}{sqrt{n}};bar{w}+1.96frac{sigma}{sqrt{n}}]\$

So we find that the probability that \$mu in [bar{w}-1.96frac{sigma}{sqrt{n}};bar{w}+1.96frac{sigma}{sqrt{n}}]\$ is \$95%\$.

This means that the unknown mean of all baskets \$mu\$ is (with a probability of \$95%\$) in the interval \$[bar{w}-1.96frac{sigma}{sqrt{n}};bar{w}+1.96frac{sigma}{sqrt{n}}]\$, and we can compute this interval from our sample (because \$bar{w}\$ can be computed from the sample and we assumed the \$sigma\$ is known).

So we are confident to find the unknown mean \$mu\$ between \$[bar{w}-1.96frac{sigma}{sqrt{n}};bar{w}+1.96frac{sigma}{sqrt{n}}]\$ at a \$95%\$ confidence level.

Note that this interval could only be derived because we knew the sampling distribution of the sample average.

P.S. If the standard deviation \$sigma\$ is not known then it has to be estimated from the sample, i.e. it is replaced by the sample standard deviation \$s\$. This has consequences for the sampling distribution of the sample mean: it is no longer normal but becomes a \$t\$-distribution with \$n-1\$ degrees of freedom, i.e. in the above \$bar{w} sim t(nu=n-1)\$. The reasoning supra remains the same, the only difference is that factor \$1.96\$ is replaced by the corresponding quantile of the \$t\$-distribution and, as said, \$sigma\$ replaced by \$s\$.

3. Interpretation of a confidence interval.

It is very important to note that \$mu\$ is unknown but it is a fixed number; it is the mean of all baskets and that number can be known, we do not know it because we decided to estimate it from a sample for reasons of time and costs ! In order to estimate the unknown \$mu\$, we have drawn a random sample. If we take another sample, than we will obviously find another value for \$bar{w}\$ and thus another confidence interval !

So the overall mean is an unknown but fixed number while the interval is computed from a random sample and therefore the interval is random !

So what does a '\$95%\$' confidence interval mean ? It means that, if we draw an infinite number of random samples, and for each sample we compute the \$95%\$ confidence interval, then we find an infinite number of confidence intervals, and \$95%\$ of all these intervals will contain the unknown \$mu\$.

4. Remark on the 'precision' and the sample size

By the above, it can be seen the the confidence interval will be smaller (and this the precision of the estimate will be higher) when \$frac{sigma}{sqrt{n}}\$ becomes smaller, or thus when \$n\$ becomes larger.

For completeness I remark that if your population (i.e. the set of all the baskets) is not 'infinite' that a 'finite population correction' can be applied. The idea is simple: if the total number of baskets in the whole population if \$N\$, then, if I draw a sample of size \$n\$ equal to the population size \$N\$ then there is no imprecision anymore (because I weighted all available baskets), so in that case the confidence interval should reduce to only one value: \$mu\$. This is not the case if we simply apply the above formulas, they become for \$n=N\$:

\$[bar{w}-1.96frac{sigma}{sqrt{N}};bar{w}+1.96frac{sigma}{sqrt{N}}]\$,

which is not a single value, contrary to what we expect.

Therefore, for finite populations of size \$N\$, the \$sigma\$ in the above formulas under section '2' should be replaced by \$sqrt{1-frac{n}{N}}sigma\$ and the interval becomes: \$[bar{w}-1.96frac{sigma sqrt{1-frac{n}{N}}}{sqrt{n}};bar{w}+1.96frac{sigma sqrt{1-frac{n}{N}}}{sqrt{n}}]\$.

It is easily verified that for \$n=N\$ this reduces to the singleton \$bar{w}\$, but as we have been exhaustive, \$bar{w}=mu\$.

If \$N\$ is infinite, \$1-frac{n}{N}=1\$, and we find the formulas derived under section 2.

Rate this post