Assume a true population of size N consisting of red and blue balls, where p represents the frequency of red in the population and q represents the frequency of blue in the population.
Let's say that N=1000, p=0.7 and q=0.3.
If I take a sample from the population and observe the frequencies in the sample to infer the true population proportions, the smaller the sample the more variance there will be.
Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?
Best Answer
Is there a way to calculate the extent to which different sample sizes will over or underestimate the true proportions?
There is, you compute the bias $B(hat p)=E(hat p-p)=E(hat p)-p$ (here $p$ is the population proportion).
In typical situations, this bias is zero.
Sampling with replacement:
Specifically, if the $R$ is the number of reds in a sample of size $n$ and $hat p=R/n$ is the sample estimate of $p$, and if we're sampling with replacement, then under the usual assumptions $R$ has a binomial distribution; $Rsim text{binomial}(n,p)$.
$E(hat p) = frac{1}{n} E(X)=frac{1}{n} ncdot p=p$
Sampling without replacement:
On the other hand, if the $R$ is the number of reds in a sample of size $n$ and $hat p=R/n$ is the sample estimate of $p$, and if we're sampling without replacement, then under the usual assumptions $R$ will have a hypergeometric distribution … and, as it turns out, the bias is still 0.
In your problem $N=1000, p=0.7$ and $q=0.3$; in the notation at that link above, $k=R$, $K=700$, $N=1000$. Again, the sample is size $n$.
As given at that link $E(R)=nK/N = np$, and so $E(hat p) = R/n = p$; i.e. the bias is 0.
Of course, if the bias in estimating $p$ is zero, it is also zero when estimating $q=1-p$
Similar Posts:
- Solved – Calculating sampling bias when estimating population proportions
- Solved – Calculating sampling bias when estimating population proportions
- Solved – Should sampling happen with or without replacement
- Solved – Unbiased estimator of variance for samples *without* replacement
- Solved – When to use finite population correction