Solved – Maximum gap between samples drawn without replacement from a discrete uniform distribution

This problem is related to my lab's research in robotic coverage:

Randomly draw $n$ numbers from the set ${1,2,ldots,m}$ without replacement, and sort the numbers in ascending order. $1le nle m$.

From this sorted list of numbers ${a_{(1)},a_{(2)},…,a_{(n)}}$, generate the difference between consecutive numbers and the boundaries: $g = {a_{(1)},a_{(2)}−a_{(1)},ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}}$. This gives $n+1$ gaps.

What is the distribution of the maximum gap?

$P(max(g) = k) = P(k;m,n) = ? $

This can be framed using order statistics:
$P(g_{(n+1)} = k) = P(k;m,n) = ? $

See link for the distribution of gaps, but this question asks the distribution of the maximum gap.

I'd be satisfied with the average value, $mathbb{E}[g_{(n+1)}]$.

If $n=m$ all the gaps are size 1.
If $n+1 = m$ there is one gap of size $2$, and $n+1$ possible locations.
The maximum gap size is $m-n+1$, and this gap can be placed before or after any of the $n$ numbers, for a total of $n+1$ possible positions.
The smallest maximum gap size is $lceilfrac{m-n}{n+1}rceil$.
Define the probability of any given combination $T= {m choose n}^{-1}$.

I've partially solved the probability mass function as
P(g_{(n+1)} = k) = P(k;m,n) =
0 & k < lceilfrac{m-n}{n+1}rceil\
1 & k = frac{m-n}{n+1} \
1 & k = 1 text{ (occurs when $m=n$)} \
T(n+1)& k = 2 text{ (occurs when $m=n+1$)} \
T(n+1)& k = frac{m-(n-1)}{n} \
? & frac{m-(n-1)}{n} le k le m-n+1 \
T(n+1)& k = m-n+1\
0 & k > m-n+1
end{cases} tag{1}

Current work (1):
The equation for the first gap, $a_{(1)}$ is straightforward:
P(a_{(1)} = k) = P(k;m,n) = frac{1}{{m choose n}}
sum_{k=1}^{m-n+1} {m-k-1 choose n-1}

The expected value has a simple value:
$mathbb{E}[P(a_{(1)})] = frac{1}{
{m choose n}}
sum_{k=1}^{m-n+1} {m-k-1 choose n-1} k = frac{m-n}{1+n}$
By symmetry, I expect all $n$ gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution $n$ times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]]; m = 1000; n = 1; trials = 100000; SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis, Frame -> {True, True, False, False}, FrameLabel -> {"k (Max gap)", "Probability"}, PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]] 

enter image description here

Let $f(g;n,m)$ be the chance that the minimum, $a_{(1)}$, equals $g$; that is, the sample consists of $g$ and an $n-1$-subset of ${g+1,g+2,ldots,m}$. There are $binom{m-g}{n-1}$ such subsets out of the $binom{m}{n}$ equally likely subsets, whence

$$Pr(a_{(1)}=g = f(g;n,m) = frac{binom{m-g}{n-1}}{binom{m}{n}}.$$

Adding $f(k;n,m)$ for all possible values of $k$ greater than $g$ yields the survival function

$$Pr(a_{(1)} gt g) = Q(g;n,m)= frac{(m-g)binom{m-g-1}{n-1}}{n binom{m}{n}}.$$

Let $G_{n,m}$ be the random variable given by the largest gap:

$$G_{n,m} = maxleft(a_{(1)}, a_{(2)}-a_{(1)}, ldots, a_{(n)}-a_{(n-1)}right).$$

(This responds to the question as originally framed, before it was modified to include a gap between $a_{(n)}$ and $m$.) We will compute its survival function $$P(g;n,m)=Pr(G_{n,m}gt g),$$ from which the entire distribution of $G_{n,m}$ is readily derived. The method is a dynamic program beginning with $n=1$, for which it is obvious that

$$P(g;1,m) = Pr(G_{1,m} gt 1) = frac{m-g}{m}, g=0, 1, ldots, m.tag{1}$$

For larger $ngt 1$, note that the event $G_{n,m}gt g$ is the disjoint union of the event

$$a_{1} gt g,$$

for which the very first gap exceeds $g$, and the $g$ separate events

$$a_{1}=ktext{ and } G_{n-1,m-k} gt g, k=1, 2, ldots, g$$

for which the first gap equals $k$ and a gap greater than $g$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

$$P(g;n,m) = Q(g;n,m) + sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).tag{2}$$

Fixing $g$ and laying out a two-way array indexed by $i=1,2,ldots,n$ and $j=1,2,ldots,m$, we may compute $P(g;n,m)$ by using $(1)$ to fill in its first row and $(2)$ to fill in each successive row using $O(gm)$ operations per row. Consequently the table can be completed in $O(gmn)$ operations and all tables for $g=1$ through $g=m-n+1$ can be constructed in $O(m^3n)$ operations.


These graphs show the survival function $gto P(g;n,64)$ for $n=1,2,4,8,16,32,64$. As $n$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for $P(g;n,m)$ can be obtained in many special cases, especially for large $n$, but I have not been able to obtain a closed formula that applies to all $g,n,m$. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of $G_{n,m}$ is obtained by summing its survival function starting at $g=0$:

$$mathbb{E}(G_{n,m}) = sum_{g=0}^{m-n+1} P(g;n,m).$$

Figure 2: contour plot of expectation

This contour plot of the expectation shows contours at $2, 4, 6, ldots, 32$, graduating from dark to light.

Similar Posts:

Rate this post

Leave a Comment