Solved – Calculating sampling bias when estimating population proportions

Assume a true population of size N consisting of red and blue balls, where p represents the frequency of red in the population and q represents the frequency of blue in the population.

Let's say that N=1000, p=0.7 and q=0.3.

If I take a sample from the population and observe the frequencies in the sample to infer the true population proportions, the smaller the sample the more variance there will be.

Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?

Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?

There is, you compute the bias $B(hat p)=E(hat p-p)=E(hat p)-p$ (here $p$ is the population proportion).

In typical situations, this bias is zero.

Sampling with replacement:

Specifically, if the $R$ is the number of reds in a sample of size $n$ and $hat p=R/n$ is the sample estimate of $p$, and if we're sampling with replacement, then under the usual assumptions $R$ has a binomial distribution; $Rsim text{binomial}(n,p)$.

$E(hat p) = frac{1}{n} E(X)=frac{1}{n} ncdot p=p$

Sampling without replacement:

On the other hand, if the $R$ is the number of reds in a sample of size $n$ and $hat p=R/n$ is the sample estimate of $p$, and if we're sampling without replacement, then under the usual assumptions $R$ will have a hypergeometric distribution … and, as it turns out, the bias is still 0.

In your problem $N=1000, p=0.7$ and $q=0.3$; in the notation at that link above, $k=R$, $K=700$, $N=1000$. Again, the sample is size $n$.

As given at that link $E(R)=nK/N = np$, and so $E(hat p) = R/n = p$; i.e. the bias is 0.

Of course, if the bias in estimating $p$ is zero, it is also zero when estimating $q=1-p$

Similar Posts:

Rate this post

Leave a Comment