# Solved – Deriving Mean Square Error of an Estimator of a Quantile

I'm trying to work out the MSE of an estimator \$T\$ for the quantile of a normal distribution. So,
\$\$
T(X_1,dots,X_n) = overline X + z_q S
\$\$
where \$x_q = mu + z_q sigma\$ and where \$z_q\$ is a corresponding lower quantile of \$Zsim N(0,1)\$.

I know that the MSE of this is \$mathbb{E}((T – x_q)^2)\$ but I want the MSE as a function of \$mu,sigma, n\$ and \$q\$.

What I have so far:
\$\$
begin{aligned}
text{MSE}(T) &= mathbb{E}((T – x_q)^2) = text{Var}(T) + (mathbb{E}(T)-x_q)^2 \
&=S+(hat X -x_q)^2
end{aligned}
\$\$
But I'm unsure where to go next.

Contents

This is simpler than it looks because for an iid Normal sample, the mean \$bar X\$ is independent of the sample standard deviation \$S\$. Therefore the variance ("MSE") of their linear combination \$\$T=bar X + z_alpha S\$\$ can be expressed as a linear combination of their variances. Let's find those variances and then do the algebra.

For a standard Normal distribution with \$mu=0\$ and \$sigma=1\$, the variance of the mean is \$sigma^2/n=1/n\$, the expectation of \$S^2\$ is the standard variance of \$sigma^2=1\$, and (this is the hard part) the expectation of \$S\$ is

\$\$c(n)=mathbb{E}(S) = sqrt{frac{2}{n-1}} frac{Gammaleft(frac{n}{2}right)}{Gammaleft(frac{n-1}{2}right)}.\$\$

Therefore, using the fact that \$operatorname{Var}(S) = mathbb{E}(S^2) – mathbb{E}(S)^2\$,

\$\$operatorname{Var}(T) = operatorname{Var}(bar X) + z_alpha^2 operatorname{Var}(S) = frac{1}{n} + z_alpha^2(1 – c(n)^2).tag{1}\$\$

Because \$T\$ scales in proportion to \$sigma\$ and is merely shifted by \$mu\$, its variance for an underlying Normal\$(mu,sigma^2)\$ distribution is \$sigma^2\$ times this value.

This can be confirmed via simulation, as shown in the appended `R` code. After you specify the values of all parameters in the question–\$n\$, \$z_alpha\$, \$sigma\$, and \$mu\$–and you specify the size of the simulation (as `n.sim`), the code computes `n.sim` realizations of \$T\$, plots their histogram (to confirm it is behaving as expected), and finishes by reporting the variance of these realizations and the theoretical value of its variance (the MSE of \$T\$) as given by \$sigma^2\$ times formula \$(1)\$.

A computational subtlety attends the calculation of \$c(n)\$: for \$ngt 343\$, \$Gamma(n/2)\$ overflows double-precision arithmetic. It is therefore essential to compute the value of \$c(n)^2\$ in \$(2)\$ using logarithms of the Gamma function, following the formula

\$\$c(n)^2 = frac{2}{n-1}expleft(2left(logGammaleft(frac{n}{2}right) – logGammaleft(frac{n-1}{2}right)right)right).\$\$

Here is an example of the simulation. After issuing the command `set.seed(17)` (to initialize the random number generator), its output for \$mu=-40, sigma=5, n=8, z_alpha=3\$ and \$10,000\$ iterations was

Variance is 18.52; expected is 18.59

The values are close. They remain consistently close when the input parameters are varied.

``n <- 8 z <- 3 sigma <- 5 mu <- -40 n.sim <- 1e4  set.seed(17) x <- matrix(rnorm(n*n.sim, mu, sigma), nrow=n) T <- apply(x, 2, function(y) mean(y) + z * sd(y)) hist(T)  v <- 1/n + z^2 * (1 - 2/(n-1) * exp(2*(lgamma(n/2) - lgamma((n-1)/2)))) message("Variance is ", signif(var(T), 4), "; expected is ", signif(sigma^2 * v, 4)) ``

Rate this post