# Solved – Confidence Interval of a Lognormal Random Variable

Just learning some stats, so please forgive if this is simple but I couldn't find a good explanation.

Let \$X sim mathcal{N}(mu,sigma^2)\$ and \$Y = e^X\$. To find an approximately 95% confidence interval, note
begin{align*}
P(a leq Y leq b) & = P(a leq e^X leq b) \
& = P(log a leq X leq log b) \
& = Pleft(frac{log a – mu}{sigma} leq Z leq frac{log b – mu}{sigma}right) \
& = frac{1}{sqrt{2pi}} int_{frac{log a – mu}{sigma}}^frac{log b – mu}{sigma} e^{-z^2/2} dz \
& triangleq 0.95,
end{align*}
for which we know
begin{align*}
frac{log b – mu}{sigma} & approx 2 iff b = e^{mu + 2sigma}, \
frac{log a – mu}{sigma} & approx 2 iff a = e^{mu – 2sigma}.
end{align*}
Then, my understanding of a confidence interval (CI) would lead me to believe 95% of the values of \$Y\$ should lie within the interval
\$\$
[e^{mu + sigma^2/2} – e^{mu – 2sigma},e^{mu + sigma^2/2} + e^{mu + 2sigma}],
\$\$
where \$e^{mu + sigma^2/2}\$ is the mean of \$Y\$. Is this correct? Specifically, when we speak of a " 95% confidence interval," do we mean that 95% of the values lie within the mean of the random variable, or another average like median or mode?

Finally, to clear up a source of confusion on notation. For a normally-distributed random variable \$X sim mathcal{N}(mu,sigma^2)\$, the variance \$sigma^2\$ is also the square of the standard deviation (SD) \$sigma\$, for which an approximate 95% confidence interval is \$[mu – 2sigma, mu + 2sigma]\$. Similarly for a lognormally-distributed random variable \$Y = e^X\$, its variance is given by \$(e^{sigma^2} – 1) e^{2mu + sigma^2}\$, and I believe its standard deviation would again just be the square root of this (by definition), namely \$left(sqrt{e^{sigma^2} – 1}right) e^{mu + sigma^2/2}\$. But now we don't have that an approximate 95% confidence interval is \$[mean – 2*SD, mean + 2*SD]\$ since the pdf of \$Y\$ is not symmetric.

So, is the \$mean pm SD\$ property for a confidence interval only valid for normal random variables?

Contents

Is this correct?

No.

i) This isn't a confidence interval you're calculating (since those are for parameters or functions of them), nor is it really a prediction interval, a tolerance interval, or any of the more common statistical intervals … since for starters it's based on known population values, not on a sample.

ii) You already calculated the limits of an interval that includes 95% of the probability; it's \$(a,b)\$, not \$(mu-a,mu+b)\$.

do we mean that 95% of the values lie within the mean of the random variable

No. The mean is a single value. How can 95% of a continuous distribution lie "within" a single value?

But now we don't have that an approximate 95% confidence interval is [mean−2∗SD,mean+2∗SD] since the pdf of Y is not symmetric.

Just because the density isn't symmetric doesn't of itself mean that a symmetric interval can't include 95% of the probability.

It doesn't include 95%, as it happens, though it's often fairly close to 95% for unimodal distributions. However, while it works pretty well for \$pm 2sigma\$, that doesn't always carry over nearly as well to other numbers of sds not close to 2.

So, is the mean±SD property for a confidence interval only valid for normal random variables?

(Again, keeping in mind that it's not a confidence interval)

Well, actually, for normal random variables, 95% of the distribution is within 1.96 sd's of the mean and 95.4% is within 2 sd's of the mean.

Those numbers are calculated from the normal distribution function; \$Phi(1.96)-Phi(-1.96)=0.9500\$ and \$Phi(2)-Phi(-2)=0.9545\$.

Rate this post

# Solved – Confidence Interval of a Lognormal Random Variable

Just learning some stats, so please forgive if this is simple but I couldn't find a good explanation.

Let \$X sim mathcal{N}(mu,sigma^2)\$ and \$Y = e^X\$. To find an approximately 95% confidence interval, note
begin{align*}
P(a leq Y leq b) & = P(a leq e^X leq b) \
& = P(log a leq X leq log b) \
& = Pleft(frac{log a – mu}{sigma} leq Z leq frac{log b – mu}{sigma}right) \
& = frac{1}{sqrt{2pi}} int_{frac{log a – mu}{sigma}}^frac{log b – mu}{sigma} e^{-z^2/2} dz \
& triangleq 0.95,
end{align*}
for which we know
begin{align*}
frac{log b – mu}{sigma} & approx 2 iff b = e^{mu + 2sigma}, \
frac{log a – mu}{sigma} & approx 2 iff a = e^{mu – 2sigma}.
end{align*}
Then, my understanding of a confidence interval (CI) would lead me to believe 95% of the values of \$Y\$ should lie within the interval
\$\$
[e^{mu + sigma^2/2} – e^{mu – 2sigma},e^{mu + sigma^2/2} + e^{mu + 2sigma}],
\$\$
where \$e^{mu + sigma^2/2}\$ is the mean of \$Y\$. Is this correct? Specifically, when we speak of a " 95% confidence interval," do we mean that 95% of the values lie within the mean of the random variable, or another average like median or mode?

Finally, to clear up a source of confusion on notation. For a normally-distributed random variable \$X sim mathcal{N}(mu,sigma^2)\$, the variance \$sigma^2\$ is also the square of the standard deviation (SD) \$sigma\$, for which an approximate 95% confidence interval is \$[mu – 2sigma, mu + 2sigma]\$. Similarly for a lognormally-distributed random variable \$Y = e^X\$, its variance is given by \$(e^{sigma^2} – 1) e^{2mu + sigma^2}\$, and I believe its standard deviation would again just be the square root of this (by definition), namely \$left(sqrt{e^{sigma^2} – 1}right) e^{mu + sigma^2/2}\$. But now we don't have that an approximate 95% confidence interval is \$[mean – 2*SD, mean + 2*SD]\$ since the pdf of \$Y\$ is not symmetric.

So, is the \$mean pm SD\$ property for a confidence interval only valid for normal random variables?

Is this correct?

No.

i) This isn't a confidence interval you're calculating (since those are for parameters or functions of them), nor is it really a prediction interval, a tolerance interval, or any of the more common statistical intervals … since for starters it's based on known population values, not on a sample.

ii) You already calculated the limits of an interval that includes 95% of the probability; it's \$(a,b)\$, not \$(mu-a,mu+b)\$.

do we mean that 95% of the values lie within the mean of the random variable

No. The mean is a single value. How can 95% of a continuous distribution lie "within" a single value?

But now we don't have that an approximate 95% confidence interval is [mean−2∗SD,mean+2∗SD] since the pdf of Y is not symmetric.

Just because the density isn't symmetric doesn't of itself mean that a symmetric interval can't include 95% of the probability.

It doesn't include 95%, as it happens, though it's often fairly close to 95% for unimodal distributions. However, while it works pretty well for \$pm 2sigma\$, that doesn't always carry over nearly as well to other numbers of sds not close to 2.

So, is the mean±SD property for a confidence interval only valid for normal random variables?

(Again, keeping in mind that it's not a confidence interval)

Well, actually, for normal random variables, 95% of the distribution is within 1.96 sd's of the mean and 95.4% is within 2 sd's of the mean.

Those numbers are calculated from the normal distribution function; \$Phi(1.96)-Phi(-1.96)=0.9500\$ and \$Phi(2)-Phi(-2)=0.9545\$.

Rate this post