# Solved – Maximum gap between samples drawn without replacement from a discrete uniform distribution

This problem is related to my lab's research in robotic coverage:

Randomly draw $$n$$ numbers from the set $${1,2,ldots,m}$$ without replacement, and sort the numbers in ascending order. $$1le nle m$$.

From this sorted list of numbers $${a_{(1)},a_{(2)},…,a_{(n)}}$$, generate the difference between consecutive numbers and the boundaries: $$g = {a_{(1)},a_{(2)}−a_{(1)},ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}}$$. This gives $$n+1$$ gaps.

What is the distribution of the maximum gap?

$$P(max(g) = k) = P(k;m,n) = ?$$

This can be framed using order statistics:
$$P(g_{(n+1)} = k) = P(k;m,n) = ?$$

See link for the distribution of gaps, but this question asks the distribution of the maximum gap.

I'd be satisfied with the average value, $$mathbb{E}[g_{(n+1)}]$$.

If $$n=m$$ all the gaps are size 1.
If $$n+1 = m$$ there is one gap of size $$2$$, and $$n+1$$ possible locations.
The maximum gap size is $$m-n+1$$, and this gap can be placed before or after any of the $$n$$ numbers, for a total of $$n+1$$ possible positions.
The smallest maximum gap size is $$lceilfrac{m-n}{n+1}rceil$$.
Define the probability of any given combination $$T= {m choose n}^{-1}$$.

I've partially solved the probability mass function as
$$P(g_{(n+1)} = k) = P(k;m,n) = begin{cases} 0 & k < lceilfrac{m-n}{n+1}rceil\ 1 & k = frac{m-n}{n+1} \ 1 & k = 1 text{ (occurs when m=n)} \ T(n+1)& k = 2 text{ (occurs when m=n+1)} \ T(n+1)& k = frac{m-(n-1)}{n} \ ? & frac{m-(n-1)}{n} le k le m-n+1 \ T(n+1)& k = m-n+1\ 0 & k > m-n+1 end{cases} tag{1}$$

Current work (1):
The equation for the first gap, $$a_{(1)}$$ is straightforward:
$$P(a_{(1)} = k) = P(k;m,n) = frac{1}{{m choose n}} sum_{k=1}^{m-n+1} {m-k-1 choose n-1}$$
The expected value has a simple value:
$$mathbb{E}[P(a_{(1)})] = frac{1}{ {m choose n}} sum_{k=1}^{m-n+1} {m-k-1 choose n-1} k = frac{m-n}{1+n}$$.
By symmetry, I expect all $$n$$ gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution $$n$$ times.

Current work (2): it is easy to run Monte Carlo simulations.

``simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]]; m = 1000; n = 1; trials = 100000; SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis, Frame -> {True, True, False, False}, FrameLabel -> {"k (Max gap)", "Probability"}, PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]] `` Contents

Let \$f(g;n,m)\$ be the chance that the minimum, \$a_{(1)}\$, equals \$g\$; that is, the sample consists of \$g\$ and an \$n-1\$-subset of \${g+1,g+2,ldots,m}\$. There are \$binom{m-g}{n-1}\$ such subsets out of the \$binom{m}{n}\$ equally likely subsets, whence

\$\$Pr(a_{(1)}=g = f(g;n,m) = frac{binom{m-g}{n-1}}{binom{m}{n}}.\$\$

Adding \$f(k;n,m)\$ for all possible values of \$k\$ greater than \$g\$ yields the survival function

\$\$Pr(a_{(1)} gt g) = Q(g;n,m)= frac{(m-g)binom{m-g-1}{n-1}}{n binom{m}{n}}.\$\$

Let \$G_{n,m}\$ be the random variable given by the largest gap:

\$\$G_{n,m} = maxleft(a_{(1)}, a_{(2)}-a_{(1)}, ldots, a_{(n)}-a_{(n-1)}right).\$\$

(This responds to the question as originally framed, before it was modified to include a gap between \$a_{(n)}\$ and \$m\$.) We will compute its survival function \$\$P(g;n,m)=Pr(G_{n,m}gt g),\$\$ from which the entire distribution of \$G_{n,m}\$ is readily derived. The method is a dynamic program beginning with \$n=1\$, for which it is obvious that

\$\$P(g;1,m) = Pr(G_{1,m} gt 1) = frac{m-g}{m}, g=0, 1, ldots, m.tag{1}\$\$

For larger \$ngt 1\$, note that the event \$G_{n,m}gt g\$ is the disjoint union of the event

\$\$a_{1} gt g,\$\$

for which the very first gap exceeds \$g\$, and the \$g\$ separate events

\$\$a_{1}=ktext{ and } G_{n-1,m-k} gt g, k=1, 2, ldots, g\$\$

for which the first gap equals \$k\$ and a gap greater than \$g\$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

\$\$P(g;n,m) = Q(g;n,m) + sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).tag{2}\$\$

Fixing \$g\$ and laying out a two-way array indexed by \$i=1,2,ldots,n\$ and \$j=1,2,ldots,m\$, we may compute \$P(g;n,m)\$ by using \$(1)\$ to fill in its first row and \$(2)\$ to fill in each successive row using \$O(gm)\$ operations per row. Consequently the table can be completed in \$O(gmn)\$ operations and all tables for \$g=1\$ through \$g=m-n+1\$ can be constructed in \$O(m^3n)\$ operations. These graphs show the survival function \$gto P(g;n,64)\$ for \$n=1,2,4,8,16,32,64\$. As \$n\$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for \$P(g;n,m)\$ can be obtained in many special cases, especially for large \$n\$, but I have not been able to obtain a closed formula that applies to all \$g,n,m\$. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of \$G_{n,m}\$ is obtained by summing its survival function starting at \$g=0\$:

\$\$mathbb{E}(G_{n,m}) = sum_{g=0}^{m-n+1} P(g;n,m).\$\$ This contour plot of the expectation shows contours at \$2, 4, 6, ldots, 32\$, graduating from dark to light.

Rate this post