# Solved – Check uniformly distributed continuous random variable

I have a random variable which I know is uniformly distributed, and I expect it to be distributed between the range \$[0,1]\$. Then, I generate (simulate) 100 realizations of the variable.

The question: Which is the best way to find the probability that the underlying "actual" range of the variable is in fact \$[0,1]\$ from the realizations?

Its clear that if the 100 realizations are within the range \$[0.51, 0.69]\$, the probability is very low, while if they are within the range \$[0.01, 0.99]\$, it should be high.

I tried simply to compute the actual vs expected range ratio
\$\$
((0.69-0.51)/(1-0)) = 0.08,
\$\$
and assumed that it was the probability \$p\$ of each realization. Then computed the probability of it happening 100 times as \$p^{100}\$.

The problem is that, for a correct case, in which the original range was in fact \$[0,1]\$, the realization range turned out to be \$[0.02,0.93]\$ and the computed probability is very low \$0.91^{100} = 0.0012\$, which does not makes sense.

So, what is it that I am getting completely wrong?

EDIT. Clarifications thanks to @whuber
(1) I am attempting to check that the range of the underlying variable is in fact [0,1] (for validation purposes) (2) I just want to check the range and know how likely is that it is in fact [0,1]. (3) realizations are independent \$rand(1,100)\$ in Matlab

Contents

Although it is meaningless to find a probability (unless you first specify a prior distribution of the endpoints), you can find the relative likelihood. A good basis for comparison would be the alternative hypothesis that the numbers are drawn from a uniform distribution between a lower bound \$L\$ and upper bound \$U\$.

Sufficient statistics are the minimum \$X\$ and maximum \$Y\$ of all the data (assuming each number is obtained independently). It doesn't matter whether you draw the data in batches or not. When drawn from the interval \$[0,1]\$, the joint distribution of \$(X, Y)\$ is continuous and has density

\$\$eqalign{f(x,y) &= binom{n}{1,n-2,1}(y-x)^{n-2}mathcal{I}(0le xle yle 1) \ &= n(n-1)(y-x)^{n-2}mathcal{I}(0le xle yle 1).}\$\$

When scaled by \$U-L\$ and shifted by \$L\$, this density becomes

\$\$f_{(L,U)}(x,y) = (U-L)^{-n} n(n-1)(y-x)^{n-2}mathcal{I}(Lle xle yle U).\$\$

Obviously this is greatest when \$L = x\$ and \$U=y\$.

The relative likelihood is their ratio, best expressed as a logarithm:

\$\$Lambda(X,Y) = logleft(frac{f_{(X,Y)}(X,Y)}{f(X,Y)}right) = -nlog(Y-X).\$\$

A small value of this is evidence for the hypothesis \$(L,U)=(0,1)\$; larger values are evidence against it. Of course if \$X lt 0\$ or \$Y gt 1\$ the hypothesis is controverted. But when the hypothesis is true, for large \$n\$ (greater than \$20\$ or so), \$2Lambda(X,Y)\$ will have approximately a \$chi^2(4)\$ distribution. Assuming \$X ge 0\$ and \$Y le 1\$, this enables you to reject the hypothesis when the chance of a \$chi^2(4)\$ variable exceeding \$2Lambda(X,Y)\$ becomes so small you can no longer suppose the large value can be attributed to chance alone.

I will not attempt to prove that the \$chi^2(4)\$ distribution is the one to use; I will merely show that it works by simulating a large number of independent values of \$2Lambda(X,Y)\$ when the hypothesis is true. Since you have the ability to generate large values of \$n\$, let's take \$n=500\$ as an example. \$100,000\$ results are shown for \$n=500\$. The red curve graphs the density of a \$chi^2(4)\$ variable. It closely agrees with the histogram.

As a worked example consider the situation posed in the question where \$n=100\$, \$X= 0.51\$, and \$Y=0.69\$. Now

\$\$-2Lambda(0.51, 0.69) = -2(100log(0.69 – 0.51)) = 343.\$\$

The corresponding \$chi^2(4)\$ probability is less than \$10^{-72}\$: although we would never trust the accuracy of the \$chi^2\$ approximation this far out into the tail (even with \$n=100\$ observations), this value is so small that certainly these data were not obtained from \$100\$ independent uniform\$(0,1)\$ variables!

In the second situation where \$X=0.01\$ and \$Y=0.99\$,

\$\$-2Lambda(0.01, 0.99) = -2(100log(0.99 – 0.01)) = 4.04.\$\$

Now the \$chi^2(4)\$ probability is \$0.40 = 40%\$, quite consistent with the hypothesis that \$(L,U)=(0,1)\$.

BTW, here's `R` code to perform simulations. I have reset it to just \$10,000\$ iterations so that it will take less than one second to complete.

``n <- 500 # Sample size N <- 1e4 # Number of simulation trials lambda <- apply(matrix(runif(n*N), nrow=n), 2, function(x) -2 * n * log(diff(range(x)))) # # Plot the results. # hist(lambda, freq=FALSE, breaks=seq(0, ceiling(max(lambda)), 1/4), border="#00000040",       main="Histogram", xlab="2*Lambda") curve(dchisq(x, 4), add=TRUE, col="Red", lwd=2) ``

Rate this post