# Solved – Calculating sampling bias when estimating population proportions

Assume a true population of size N consisting of red and blue balls, where p represents the frequency of red in the population and q represents the frequency of blue in the population.

Let's say that N=1000, p=0.7 and q=0.3.

If I take a sample from the population and observe the frequencies in the sample to infer the true population proportions, the smaller the sample the more variance there will be.

Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?

Contents

Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?

There is, you compute the bias \$B(hat p)=E(hat p-p)=E(hat p)-p\$ (here \$p\$ is the population proportion).

In typical situations, this bias is zero.

Sampling with replacement:

Specifically, if the \$R\$ is the number of reds in a sample of size \$n\$ and \$hat p=R/n\$ is the sample estimate of \$p\$, and if we're sampling with replacement, then under the usual assumptions \$R\$ has a binomial distribution; \$Rsim text{binomial}(n,p)\$.

\$E(hat p) = frac{1}{n} E(X)=frac{1}{n} ncdot p=p\$

Sampling without replacement:

On the other hand, if the \$R\$ is the number of reds in a sample of size \$n\$ and \$hat p=R/n\$ is the sample estimate of \$p\$, and if we're sampling without replacement, then under the usual assumptions \$R\$ will have a hypergeometric distribution … and, as it turns out, the bias is still 0.

In your problem \$N=1000, p=0.7\$ and \$q=0.3\$; in the notation at that link above, \$k=R\$, \$K=700\$, \$N=1000\$. Again, the sample is size \$n\$.

As given at that link \$E(R)=nK/N = np\$, and so \$E(hat p) = R/n = p\$; i.e. the bias is 0.

Of course, if the bias in estimating \$p\$ is zero, it is also zero when estimating \$q=1-p\$

Rate this post