# Solved – Bounds for the population variance

Suppose we have i.i.d. samples \$x_1\$, \$ldots\$, \$x_n\$ for a (potentially non-normal) random variable \$X\$ with finite moments. We can use these samples to construct an unbiased estimates of the population mean and population variance
\$\$
bar{x} = n^{-1} sum_{i=1}^n x_i qquadtext{and}qquad s^2 = frac{1}{n-1} sum_{i=1}^n (x_i – bar{x})^2 enspace.
\$\$
Without making any assumptions on the distribution of \$X\$, it is possible to construct probabilistic bounds on the population mean, by using Chebyshev's inequality (see, e.g., wikipedia or the original paper).

My question is: do such probabilistic bounds exist for the population variance? In other words, can we say that with probability \$delta\$ the population variance \$sigma^2\$ will be in some interval \$[L(delta,{x_i}),U(delta,{x_i})]\$? And if so, what are the functions \$L\$ and \$U\$ that describe the lower and upper bound?

For normal distributions the sample variance follows a \$sigma^2 chi^2_{n-1} (n-1)^{-1}\$ distribution. This can be used to construct confidence intervals. However, I am looking for more general bounds that apply also to non-normal settings.

Contents

The general asymptotic result for the asymptotic distribution of the sample variance is (see this post)

\$\$sqrt n(hat v – v) xrightarrow{d} Nleft(0,mu_4 – v^2right)\$\$

where here, I have used the notation \$vequiv sigma^2\$ to avoid later confusion with squares, and where \$mu_4 = mathrm{E}left((X_i -mu)^4right)\$. Therefore by the continuous mapping theorem

\$\$frac {n(hat v – v)^2}{mu_4 – v^2} xrightarrow{d} chi^2_1 \$\$

Then, accepting the approximation,

\$\$Pleft(frac {n(hat v – v)^2}{mu_4 – v^2}leq chi^2_{1,1-a}right)=1-a\$\$

The term in the parenthesis will give us a quadratic equation in \$v\$ that will include the unknown term \$mu_4\$. Accepting a further approximation, we can estimate this from the sample. Then we will obtain

\$\$Pleft(Av^2 + Bv +Gammaleq 0 right)=1-a\$\$

The roots of the polynomial are

\$\$v^*_{1,2}= frac {-B pm sqrt {B^2 -4AGamma}}{2A}\$\$

and our \$1-a\$ confidence interval for the population variance will be

\$\$maxBig{0,min{v^*_{1,2}}Big}leq sigma^2 leq max{v^*_{1,2}}\$\$

since the probability that the quadratic polynomial is smaller than zero, equals (in our case, where \$A>0\$) the probability that the population variance lies in between the roots of the polynomial.

## Monte Carlo Study

For clarity, denote \$chi^2_{1,1-a}equiv z\$.

A little algebra gives us that

\$\$A = n+z, ;; B = -2nhat v,;; Gamma = nhat v^2 -z hat mu_4\$\$

\$\$v^*_{1,2}= frac {nhat v pm sqrt {nz(hat mu_4-hat v^2)+z^2hat mu_4}}{n+z}\$\$

For \$a=0.05\$ we have \$chi^2_{1,1-a}equiv z = 3.84\$

I generated \$10,000\$ samples each of size \$n=100\$ from a Gamma distribution with shape parameter \$k=3\$ and scale parameter \$theta = 2\$. The true mean is \$mu = 6\$, and the true variance is \$v=sigma^2 =12\$.

Results:
The sample distribution of the sample variance had a long road ahead to become normal, but this is to be expected for the small sample size chosen. Its average value though was \$11.88\$, pretty close to the true value.

The estimation bound was smaller than the true variance, in \$1,456\$ samples, while the lower bound was greater than the true variance only \$17\$ times. So the true value was missed by the \$CI\$ in \$14.73\$% of the samples, mostly due to undershooting, giving a confidence level of \$85\$%, which is a \$~10\$ percentage points worsening from the nominal confidence level of \$95\$%.

On average the lower bound was \$7.20\$, while on average the upper bound was \$15.68\$. The average length of the CI was \$8.47\$. Its minimum length was \$2.56\$ while its maximum length was \$34.52\$.

Rate this post