I know that the statement in question is wrong because estimators cannot have asymptotic variances that are lower than the Cramer-Rao bound.

However, if asymptotic consistence means that an estimator converges in probability to a value, then doesn't this also mean that its variance becomes 0?

Where in this train of thought am I wrong?

#### Best Answer

Convergence of a sequence of random variables in probability does not imply convergence of their variances, nor even that their variances get anywhere near $0.$ In fact, their means may converge to a constant yet their variances can still diverge.

### Examples and counterexamples

Construct counterexamples by creating ever more rare events that are increasingly far from the mean: the squared distance from the mean can overwhelm the decreasing probability and cause the variance to do anything (as I will proceed to show).

For instance, scale a Bernoulli$(1/n)$ variate by $n^{p}$ for some power $p$ to be determined. That is, define the sequence of random variables $X_n$ by

$$begin{aligned} &Pr(X_n=n^{p})=1/n \ &Pr(X_n=0)= 1 – 1/n. end{aligned}$$

As $nto infty$, because $Pr(X_n=0)to 1$ this converges in probability to $0;$ its expectation $n^{p-1}$ even converges to $0$ provided $plt 1;$ but for $pgt 1/2$ its variance $n^{2p-1}(1-1/n)$ diverges.

### Comments

Many other behaviors are possible:

Because negative powers $2p-1$ of $n$ converge to $0,$ the variance converges to $0$ for $plt 1/2:$ the variables "squeeze down" to $0$ in some sense.

An interesting edge case is $p=1/2,$ for which the variance converges to $1.$

By varying $p$ above and below $1/2$ depending on $n$ you can even make the variance not converge at all. For instance, let $p(n)=0$ for even $n$ and $p(n)=1$ for odd $n.$

### A direct connection with estimation

Finally, a reasonable possible objection is that abstract sequences of random variables are not really "estimators" of anything. But they can nevertheless be involved in estimation. For instance, let $t_n$ be a sequence of statistics, intended to estimate some numerical property $theta(F)$ of the common distribution of an (arbitrarily large) iid random sample $(Y_1,Y_2,ldots,Y_n,ldots)$ of $F.$ This induces a sequence of random variables

$$T_n = t_n(Y_1,Y_2,ldots,Y_n).$$

Modify this sequence by choosing any value of $p$ (as above) you like and set

$$T^prime_n = T_n + (X_n – n^{p-1}).$$

The parenthesized term makes a zero-mean adjustment to $T_n,$ so that if $T_n$ is a reasonable estimator of $theta(F),$ then so is $T^prime_n.$ (With some imagination we can conceive of situations where $T_n^prime$ could yield *better* estimates than $T_n$ with probability close to $1.$) However, if you make the $X_n$ independent of $Y_1,ldots, Y_n,$ the variance of $T^prime_n$ will be the sum of the variances of $T_n$ and $X_n,$ which you thereby can cause to diverge.

### Similar Posts:

- Solved – Convergence of identically distributed normal random variables
- Solved – Converge in probability, always zero variance
- Solved – What happens when merging random variables in Dirichlet distribution
- Solved – Which converges faster, mean or median
- Solved – Convergence in distribution of sum of random variables