Just learning some stats, so please forgive if this is simple but I couldn't find a good explanation.
Let $X sim mathcal{N}(mu,sigma^2)$ and $Y = e^X$. To find an approximately 95% confidence interval, note
begin{align*}
P(a leq Y leq b) & = P(a leq e^X leq b) \
& = P(log a leq X leq log b) \
& = Pleft(frac{log a – mu}{sigma} leq Z leq frac{log b – mu}{sigma}right) \
& = frac{1}{sqrt{2pi}} int_{frac{log a – mu}{sigma}}^frac{log b – mu}{sigma} e^{-z^2/2} dz \
& triangleq 0.95,
end{align*}
for which we know
begin{align*}
frac{log b – mu}{sigma} & approx 2 iff b = e^{mu + 2sigma}, \
frac{log a – mu}{sigma} & approx 2 iff a = e^{mu – 2sigma}.
end{align*}
Then, my understanding of a confidence interval (CI) would lead me to believe 95% of the values of $Y$ should lie within the interval
$$
[e^{mu + sigma^2/2} – e^{mu – 2sigma},e^{mu + sigma^2/2} + e^{mu + 2sigma}],
$$
where $e^{mu + sigma^2/2}$ is the mean of $Y$. Is this correct? Specifically, when we speak of a " 95% confidence interval," do we mean that 95% of the values lie within the mean of the random variable, or another average like median or mode?
Finally, to clear up a source of confusion on notation. For a normally-distributed random variable $X sim mathcal{N}(mu,sigma^2)$, the variance $sigma^2$ is also the square of the standard deviation (SD) $sigma$, for which an approximate 95% confidence interval is $[mu – 2sigma, mu + 2sigma]$. Similarly for a lognormally-distributed random variable $Y = e^X$, its variance is given by $(e^{sigma^2} – 1) e^{2mu + sigma^2}$, and I believe its standard deviation would again just be the square root of this (by definition), namely $left(sqrt{e^{sigma^2} – 1}right) e^{mu + sigma^2/2}$. But now we don't have that an approximate 95% confidence interval is $[mean – 2*SD, mean + 2*SD]$ since the pdf of $Y$ is not symmetric.
So, is the $mean pm SD$ property for a confidence interval only valid for normal random variables?
Best Answer
Is this correct?
No.
i) This isn't a confidence interval you're calculating (since those are for parameters or functions of them), nor is it really a prediction interval, a tolerance interval, or any of the more common statistical intervals … since for starters it's based on known population values, not on a sample.
ii) You already calculated the limits of an interval that includes 95% of the probability; it's $(a,b)$, not $(mu-a,mu+b)$.
do we mean that 95% of the values lie within the mean of the random variable
No. The mean is a single value. How can 95% of a continuous distribution lie "within" a single value?
But now we don't have that an approximate 95% confidence interval is [mean−2∗SD,mean+2∗SD] since the pdf of Y is not symmetric.
Just because the density isn't symmetric doesn't of itself mean that a symmetric interval can't include 95% of the probability.
It doesn't include 95%, as it happens, though it's often fairly close to 95% for unimodal distributions. However, while it works pretty well for $pm 2sigma$, that doesn't always carry over nearly as well to other numbers of sds not close to 2.
So, is the mean±SD property for a confidence interval only valid for normal random variables?
(Again, keeping in mind that it's not a confidence interval)
Well, actually, for normal random variables, 95% of the distribution is within 1.96 sd's of the mean and 95.4% is within 2 sd's of the mean.
Those numbers are calculated from the normal distribution function; $Phi(1.96)-Phi(-1.96)=0.9500$ and $Phi(2)-Phi(-2)=0.9545$.
Similar Posts:
- Solved – Confidence Interval of a Lognormal Random Variable
- Solved – Confidence Interval of a Lognormal Random Variable
- Solved – Error Bars for Monte Carlo Experiment
- Solved – Estimating Uniform distribution endpoints using data with errors
- Solved – confusion regarding confidence interval of normal distribution