# Solved – Expectation of the variance of the sampling set without replacement

Select \$n\$ numbers without replacement from the set \${1,2,…,m}\$, and generate the set \$S={a_1,a_2,…,a_n}\$. I want to calculate the expectation of the variance for the sampling set \$mathbb{E}[Var(S)]\$ and the maximum variance among all samples : \$max{Var(S)}\$.

Besides, what's the distribution of the sample variance?

Contents

We know that

\$\$widehat{Var}(mathbf{a}) = frac{1}{n-1}left(sum_{i=1}^n a_i^2 – frac{1}{n}left(sum_{i=1}^n a_i right)^2 right)\$\$

is an unbiased estimator of the population variance, which is easily computed as \$(m+1)m/12\$. This, therefore, answers the first question concerning the expected variance.

I will only sketch how to maximize the variance. I claim it is maximized when the \$a_i\$ are in two contiguous blocks: that is, \$mathbf{a}\$ is in the form

\$\$mathbf{a} = (1, 2, ldots, k, m-l+1, m-l+2, ldots, m).\$\$

(Evidently \$k+l = n\$.) To prove this claim, suppose \$mathbf{a}\$ is not in this form: then you can find a gap in one of the end sequences and increase the variance by changing one of the components of \$mathbf{a}\$ to that gap. It remains only to maximize the variance among these special forms of \$mathbf{a}\$; this is done by making the end sequence lengths as balanced as possible; that is, by setting \$k=l\$ when \$n\$ is even and otherwise by setting either \$k=l+1\$ or \$l=k+1\$. When \$n=2k\$ is even, the maximum variance equals

\$\$ n frac{left(3 m^2-3 m n+n^2 -1right)}{12 (n-1)}.\$\$

When \$n=2k+1\$ is odd, the maximum variance is

\$\$(n+1) frac{left(3 m^2-3 m n+n^2right)}{12 n}.\$\$

Rate this post