Intrigued by a question at math.stackexchange, and investigating it empirically, I am wondering about the following statement on the square-root of sums of i.i.d. random variables.
Suppose $X_1, X_2, ldots, X_n$ are i.i.d. random variables with finite non-zero mean $mu$ and variance $sigma^2$, and $displaystyle Y=sum_{i=1}^n X_i$. The central limit theorem says $displaystyle dfrac{Y – nmu}{sqrt{nsigma^2}} xrightarrow{d} N(0,1)$ as $n$ increases.
If $Z=sqrt{|Y|}$, can I also say something like $displaystyle dfrac{Z – sqrt{n |mu|-tfrac{sigma^2}{4|mu|}}}{sqrt{tfrac{sigma^2}{4|mu|}}} xrightarrow{d} N(0,1)$ as $n$ increases?
For example, suppose the $X_i$ are Bernoulli with mean $p$ and variance $p(1-p)$, then $Y$ is binomial and I can simulate this in R, say with $p=frac13$:
set.seed(1) cases <- 100000 n <- 1000 p <- 1/3 Y <- rbinom(cases, size=n, prob=p) Z <- sqrt(abs(Y))
which gives approximately the hoped-for mean and variance for $Z$
> c(mean(Z), sqrt(n*p - (1-p)/4)) [1] 18.25229 18.25285 > c(var(Z), (1-p)/4) [1] 0.1680012 0.1666667
and a Q-Q plot which looks close to Gaussian
qqnorm(Z)
Best Answer
The convergence to a Gaussian is indeed a general phenomenon.
Suppose that $X_1,X_2,X_3,…$ are IID random variables with mean $mugt 0$ and variance $sigma^2$, and define the sums $Y_n=sum_{i=1}^n X_i$. Fix a number $alpha$. The usual Central Limit Theorem tells us that $P(frac{Y_n-nmu}{sigmasqrt n}leq alpha)toPhi(alpha)$ as $ntoinfty$, where $Phi$ is the standard normal cdf. However, the continuity of the limiting cdf implies that we also have $$PBig(frac{Y_n-nmu}{sigmasqrt n}leq alpha+frac{alpha^2 sigma^2}{4musigmasqrt n}Big)toPhi(alpha)$$ because the additional term on the right hand side of the inequality tends to zero. Rearranging this expression leads to $$PBig(Y_nleq (frac{alphasigma}{2sqrt mu}+sqrt{nmu})^2Big)toPhi(alpha)$$
Taking square roots, and noting that $mugt 0$ implies that $P(Y_nlt 0)to 0$, we obtain $$PBig(sqrt{|Y_n|}leq frac{alphasigma}{2sqrt mu}+sqrt{nmu}Big)toPhi(alpha)$$ In other words, $frac{sqrt{|Y_n|}-sqrt{nmu}}{sigma/{2sqrtmu}}xrightarrow{d}N(0,1)$. This result demonstrates convergence to a Gaussian in the limit as $ntoinfty$.
Does this mean that $sqrt{nmu}$ is a good approximation to $E[sqrt{|Y_n|}]$ for large $n$? Well, we can do better than this. As @Henry notes, assuming everything is positive, we can use $E[sqrt{Y_n}]=sqrt{E[Y_n]-text{Var}(sqrt{Y_n})}$, together with $E[Y_n]=nmu$ and the approximation $text{Var}(sqrt{Y_n})approx frac{sigma^2}{4mu}$, to obtain the improved approximation $E[sqrt{|Y_n|}]approxsqrt{nmu- dfrac{sigma^2}{4mu}}$ as stated in the question above. Note also that we still have $$frac{sqrt{|Y_n|}-sqrt{nmu-frac{sigma^2}{4mu}}}{sigma/{2sqrtmu}}xrightarrow{d}N(0,1)$$ because $sqrt{nmu-frac{sigma^2}{4mu}}-sqrt{nmu}to 0$ as $ntoinfty$.
Similar Posts:
- Solved – Convergence from Gamma to Normal Distribution
- Solved – Convergence from Gamma to Normal Distribution
- Solved – Convergence from Gamma to Normal Distribution
- Solved – Convergence from Gamma to Normal Distribution
- Solved – Convergence almost sure of sequence random variables with Bernoulli distribution