Solved – In R, how to sample from the output of combn(a,b) if the “a choose b” is too large

I would like to sample 1 million vectors from the set of vectors

$$mathbf{v} = {v_1,v_2,…,v_n } $$

subject to

  • $sum_{i=1}^n v_i = 1$
  • $forall i: v_i ge 0$
  • each $v_i$ has at most 2 decimal places

As an example I have $n = 10$. Effectively, I am choosing to divide $100$ items into $10$ portion such that there is no minimum number of items that can be in any portion (meaning $0$ item in a portion is possible).

This is an interesting combinatorics (albeit elementary) problem. Suppose a rich man wants to divide his $100 billion wealth amongst his 10 sons. Suppose he wants each son to get at least $1 billion and the *i*th son should receive $n$ billion where $n$ is an integer greater than 0. Imagine he has stacked his money in $1 billion piles in a row with some gap between each pile. Then he would have 99 gaps between the piles. To divide his money he has to simply choose 9 cut points from the 99 gaps. So it's 99 choose 9. It general if he has $c billion and d sons, it would be c-1 choose d ways to divide his wealth. Now suppose he wants to divide his wealth so that there is no minimum and the *i*th son could get $n_i$ billion where $n_i$ is an integer greater than or equal to 0. How many ways are there to do this? He was clever and asked his friend to lend him $10 billion and he also stacked them into a row together with his other billions. Now there are 110 piles of $1 billion dollars in a row and there are 109 gaps. He chooses 9 cut points from these 109 gaps. Now he has divided $110 billion into 10 portions each with at least $1 billion. Now he retrieves $1 billion from each portion, and returns the $10 to his friend. Now the original $100 billion is divided into 10 piles in such a way that there is no minimum amount!

So $109$ choose $9$ would give me the combinations.

In R, combn(a,b) would give me all the cuts points from all possible combinations "a choose b". However running combn(109,9) is not possible on my computer.

In R, is there an efficient way to sample from the output of combn?

There are a couple of good answers here already, but none which directly implement the problem as suggested in the question. The steps to do this are:

  • Sample 9 out of the 109 numbers from 0.01 to 1.09: sample(seq(0.01, 1.09, by=0.01), 9)
  • Find the differences between them:
    • Sort them: sort
    • Place them in a vector between the minimum and maximum possible values: c(0, ..., 1.1)
    • Difference: diff
  • Remove 0.01 from each resulting value, so that 0s are possible: - 0.01

To do this N times, the code can be wrapped up in sapply. The resulting code is:

N = 1e6 y <- sapply( 1:N, function(i) {     diff( c(0,              sort(sample(seq(0.01, 1.09, by=0.01), 9)),             1.1) ) - 0.01 } ) mean(y) hist(y, breaks=seq(-0.005, 1.005, by=0.01), freq=FALSE,      main="Histogram of partial sums", xlab="Value") 

Taking the mean and producing a histogram is suggested by @whuber and yields a mean of 0.1 and the following histogram, consistent with his answer.

enter image description here

On my latop (core-i7, 2.3Ghz), wrapping a system.time call around y->... gave an elapsed time of 124s, i.e. just over 2 minutes.

Similar Posts:

Rate this post

Leave a Comment