An acquaintance of mine has been using this wrong inference formula for years: given

- a i.i.d. sample $mathbf{X}={X_1,dots,X_N}$ for a continuous RV $X$,
- sample mean $bar{X}=frac{sum X_i}{N}$ and sample standard deviation $bar{sigma}=frac{sum left(X_i-bar{X}right)^2}{N-1}$

estimate the 0.95-quantile $q_{0.95}$ as

$$q_{0.95} = bar{X} + 2 bar{sigma}$$

(which is not even a decent point estimate – you should at the very least use $q_{0.95} = bar{X} + 1.645bar{sigma}$).

What are the correct confidence intervals for a generic $q$, in the three cases:

- we know nothing on $X$ (apart from the fact that it's continuous), and we look for an exact (non-asymptotic) answer. I think this answer for the median could be modified for a generic quantile
- as before, but an asymptotic solution is fine. I guess there should be at least a couple answers here…one for quantiles which aren't close to 0 or 1, and one for quantiles which are. Maybe one based on normal approximation and one based on Poisson?
- finally, we assume $X$ to have a Gaussian distribution with unknown mean and variance.

**Contents**hide

#### Best Answer

- The first case was answered in detail in this question.
- One example of the second case is shown here, where the authors apply a normal approximation to the binomial distribution used in calculations of the first case.

The third case is given by Hahn and Meeker in their handbook *Statistical Intervals* (2nd ed., Wiley 2017):

A two-sided $100(1-alpha)%$ confidence interval for $x_q$, the $q$ quantile of the normal distribution, is

$$ left[bar{x}-t_{(1-alpha/2;,n-1,,delta)}frac{s}{sqrt{n}},;bar{x}-t_{(alpha/2;,n-1,,delta)}frac{s}{sqrt{n}}right] $$ where $t_{(gamma;,n-1,,delta)}$ is the $gamma$ quantile of a noncentral $t$-distribution with $n-1$ degrees of freedom and noncentrality parameter $delta = -sqrt{n}z_{(q)}=sqrt{n}z_{(1-p)}$.

Here, $z_{(q)}$ denotes $Phi^{-1}(q)$, the $q$ quantile of the standard normal distribution.

For example, let's assume we drew $n=20$ samples from a normal distribution with unknown mean and standard deviation. The sample mean was $bar{x}=10.5$ and the sample standard deviation was $s=3.19$. Then, the two-sided $95%$ confidence interval for the $q=0.25$ quantile $x_{0.25}$ would be given by $(6.42; 9.76)$.

Here is some `R`

code and a small simulation to check the coverage. By changing the parameters, you can run your own simulations:

`normquantCI <- function(x, conf_level = 0.95, q = 0.5) { x <- na.omit(x) n <- length(x) xbar <- mean(x) s <- sd(x) ncp <- -sqrt(n)*qnorm(q) tval <- qt(c((1 + conf_level)/2, (1 - conf_level)/2), n - 1, ncp) se <- s/sqrt(n) xbar - tval*se } # Simulate the coverage set.seed(142857) q <- 0.25 # Quantile to calculate the CI for conf_level <- 0.95 # Confidence level true_mean <- 100 # The true mean of the normal distribution true_sd <- 15 # True sd of the normal distribution sampsi <- 20 # The sample size trueq <- qnorm(q, true_mean, true_sd) # The true quantile res <- replicate(1e5, { citmp <- normquantCI(rnorm(sampsi, true_mean, true_sd), conf_level = conf_level, q = q) ifelse(citmp[1] < trueq & citmp[2] > trueq, 1, 0) }) sum(res)/length(res) [1] 0.95043 `

### Similar Posts:

- Solved – confusion regarding confidence interval of normal distribution
- Solved – How to get any quantiles given median value and margin of error
- Solved – What does (ttest$conf.int[2] – ttest$conf.int[1])/2 mean
- Solved – Is centering needed when bootstrapping the sample mean
- Solved – Confidence interval for the population mean