Solved – Bounds for the population variance

Suppose we have i.i.d. samples $x_1$, $ldots$, $x_n$ for a (potentially non-normal) random variable $X$ with finite moments. We can use these samples to construct an unbiased estimates of the population mean and population variance
$$
bar{x} = n^{-1} sum_{i=1}^n x_i qquadtext{and}qquad s^2 = frac{1}{n-1} sum_{i=1}^n (x_i – bar{x})^2 enspace.
$$
Without making any assumptions on the distribution of $X$, it is possible to construct probabilistic bounds on the population mean, by using Chebyshev's inequality (see, e.g., wikipedia or the original paper).

My question is: do such probabilistic bounds exist for the population variance? In other words, can we say that with probability $delta$ the population variance $sigma^2$ will be in some interval $[L(delta,{x_i}),U(delta,{x_i})]$? And if so, what are the functions $L$ and $U$ that describe the lower and upper bound?

For normal distributions the sample variance follows a $sigma^2 chi^2_{n-1} (n-1)^{-1}$ distribution. This can be used to construct confidence intervals. However, I am looking for more general bounds that apply also to non-normal settings.

The general asymptotic result for the asymptotic distribution of the sample variance is (see this post)

$$sqrt n(hat v – v) xrightarrow{d} Nleft(0,mu_4 – v^2right)$$

where here, I have used the notation $vequiv sigma^2$ to avoid later confusion with squares, and where $mu_4 = mathrm{E}left((X_i -mu)^4right)$. Therefore by the continuous mapping theorem

$$frac {n(hat v – v)^2}{mu_4 – v^2} xrightarrow{d} chi^2_1 $$

Then, accepting the approximation,

$$Pleft(frac {n(hat v – v)^2}{mu_4 – v^2}leq chi^2_{1,1-a}right)=1-a$$

The term in the parenthesis will give us a quadratic equation in $v$ that will include the unknown term $mu_4$. Accepting a further approximation, we can estimate this from the sample. Then we will obtain

$$Pleft(Av^2 + Bv +Gammaleq 0 right)=1-a$$

The roots of the polynomial are

$$v^*_{1,2}= frac {-B pm sqrt {B^2 -4AGamma}}{2A}$$

and our $1-a$ confidence interval for the population variance will be

$$maxBig{0,min{v^*_{1,2}}Big}leq sigma^2 leq max{v^*_{1,2}}$$

since the probability that the quadratic polynomial is smaller than zero, equals (in our case, where $A>0$) the probability that the population variance lies in between the roots of the polynomial.


Monte Carlo Study

For clarity, denote $chi^2_{1,1-a}equiv z$.

A little algebra gives us that

$$A = n+z, ;; B = -2nhat v,;; Gamma = nhat v^2 -z hat mu_4$$

which leads to

$$v^*_{1,2}= frac {nhat v pm sqrt {nz(hat mu_4-hat v^2)+z^2hat mu_4}}{n+z}$$

For $a=0.05$ we have $chi^2_{1,1-a}equiv z = 3.84$

I generated $10,000$ samples each of size $n=100$ from a Gamma distribution with shape parameter $k=3$ and scale parameter $theta = 2$. The true mean is $mu = 6$, and the true variance is $v=sigma^2 =12$.

Results:
The sample distribution of the sample variance had a long road ahead to become normal, but this is to be expected for the small sample size chosen. Its average value though was $11.88$, pretty close to the true value.

The estimation bound was smaller than the true variance, in $1,456$ samples, while the lower bound was greater than the true variance only $17$ times. So the true value was missed by the $CI$ in $14.73$% of the samples, mostly due to undershooting, giving a confidence level of $85$%, which is a $~10$ percentage points worsening from the nominal confidence level of $95$%.

On average the lower bound was $7.20$, while on average the upper bound was $15.68$. The average length of the CI was $8.47$. Its minimum length was $2.56$ while its maximum length was $34.52$.

Similar Posts:

Rate this post

Leave a Comment