Solved – Deriving Mean Square Error of an Estimator of a Quantile

I'm trying to work out the MSE of an estimator $T$ for the quantile of a normal distribution. So,
$$
T(X_1,dots,X_n) = overline X + z_q S
$$
where $x_q = mu + z_q sigma$ and where $z_q$ is a corresponding lower quantile of $Zsim N(0,1)$.

I know that the MSE of this is $mathbb{E}((T – x_q)^2)$ but I want the MSE as a function of $mu,sigma, n$ and $q$.

What I have so far:
$$
begin{aligned}
text{MSE}(T) &= mathbb{E}((T – x_q)^2) = text{Var}(T) + (mathbb{E}(T)-x_q)^2 \
&=S+(hat X -x_q)^2
end{aligned}
$$
But I'm unsure where to go next.

This is simpler than it looks because for an iid Normal sample, the mean $bar X$ is independent of the sample standard deviation $S$. Therefore the variance ("MSE") of their linear combination $$T=bar X + z_alpha S$$ can be expressed as a linear combination of their variances. Let's find those variances and then do the algebra.

For a standard Normal distribution with $mu=0$ and $sigma=1$, the variance of the mean is $sigma^2/n=1/n$, the expectation of $S^2$ is the standard variance of $sigma^2=1$, and (this is the hard part) the expectation of $S$ is

$$c(n)=mathbb{E}(S) = sqrt{frac{2}{n-1}} frac{Gammaleft(frac{n}{2}right)}{Gammaleft(frac{n-1}{2}right)}.$$

Therefore, using the fact that $operatorname{Var}(S) = mathbb{E}(S^2) – mathbb{E}(S)^2$,

$$operatorname{Var}(T) = operatorname{Var}(bar X) + z_alpha^2 operatorname{Var}(S) = frac{1}{n} + z_alpha^2(1 – c(n)^2).tag{1}$$

Because $T$ scales in proportion to $sigma$ and is merely shifted by $mu$, its variance for an underlying Normal$(mu,sigma^2)$ distribution is $sigma^2$ times this value.


This can be confirmed via simulation, as shown in the appended R code. After you specify the values of all parameters in the question–$n$, $z_alpha$, $sigma$, and $mu$–and you specify the size of the simulation (as n.sim), the code computes n.sim realizations of $T$, plots their histogram (to confirm it is behaving as expected), and finishes by reporting the variance of these realizations and the theoretical value of its variance (the MSE of $T$) as given by $sigma^2$ times formula $(1)$.

A computational subtlety attends the calculation of $c(n)$: for $ngt 343$, $Gamma(n/2)$ overflows double-precision arithmetic. It is therefore essential to compute the value of $c(n)^2$ in $(2)$ using logarithms of the Gamma function, following the formula

$$c(n)^2 = frac{2}{n-1}expleft(2left(logGammaleft(frac{n}{2}right) – logGammaleft(frac{n-1}{2}right)right)right).$$

Here is an example of the simulation. After issuing the command set.seed(17) (to initialize the random number generator), its output for $mu=-40, sigma=5, n=8, z_alpha=3$ and $10,000$ iterations was

Variance is 18.52; expected is 18.59

The values are close. They remain consistently close when the input parameters are varied.

n <- 8 z <- 3 sigma <- 5 mu <- -40 n.sim <- 1e4  set.seed(17) x <- matrix(rnorm(n*n.sim, mu, sigma), nrow=n) T <- apply(x, 2, function(y) mean(y) + z * sd(y)) hist(T)  v <- 1/n + z^2 * (1 - 2/(n-1) * exp(2*(lgamma(n/2) - lgamma((n-1)/2)))) message("Variance is ", signif(var(T), 4), "; expected is ", signif(sigma^2 * v, 4)) 

Similar Posts:

Rate this post

Leave a Comment