# Solved – In R, how to sample from the output of combn(a,b) if the “a choose b” is too large

I would like to sample 1 million vectors from the set of vectors

\$\$mathbf{v} = {v_1,v_2,…,v_n } \$\$

subject to

• \$sum_{i=1}^n v_i = 1\$
• \$forall i: v_i ge 0\$
• each \$v_i\$ has at most 2 decimal places

As an example I have \$n = 10\$. Effectively, I am choosing to divide \$100\$ items into \$10\$ portion such that there is no minimum number of items that can be in any portion (meaning \$0\$ item in a portion is possible).

This is an interesting combinatorics (albeit elementary) problem. Suppose a rich man wants to divide his \$100 billion wealth amongst his 10 sons. Suppose he wants each son to get at least \$1 billion and the *i*th son should receive \$n\$ billion where \$n\$ is an integer greater than 0. Imagine he has stacked his money in \$1 billion piles in a row with some gap between each pile. Then he would have 99 gaps between the piles. To divide his money he has to simply choose 9 cut points from the 99 gaps. So it's 99 choose 9. It general if he has \$c billion and d sons, it would be c-1 choose d ways to divide his wealth. Now suppose he wants to divide his wealth so that there is no minimum and the *i*th son could get \$n_i\$ billion where \$n_i\$ is an integer greater than or equal to 0. How many ways are there to do this? He was clever and asked his friend to lend him \$10 billion and he also stacked them into a row together with his other billions. Now there are 110 piles of \$1 billion dollars in a row and there are 109 gaps. He chooses 9 cut points from these 109 gaps. Now he has divided \$110 billion into 10 portions each with at least \$1 billion. Now he retrieves \$1 billion from each portion, and returns the \$10 to his friend. Now the original \$100 billion is divided into 10 piles in such a way that there is no minimum amount!

So \$109\$ choose \$9\$ would give me the combinations.

In R, `combn(a,b)` would give me all the cuts points from all possible combinations "a choose b". However running `combn(109,9)` is not possible on my computer.

In R, is there an efficient way to sample from the output of `combn`?

Contents

There are a couple of good answers here already, but none which directly implement the problem as suggested in the question. The steps to do this are:

• Sample 9 out of the 109 numbers from 0.01 to 1.09: `sample(seq(0.01, 1.09, by=0.01), 9)`
• Find the differences between them:
• Sort them: `sort`
• Place them in a vector between the minimum and maximum possible values: `c(0, ..., 1.1)`
• Difference: `diff`
• Remove 0.01 from each resulting value, so that 0s are possible: `- 0.01`

To do this `N` times, the code can be wrapped up in `sapply`. The resulting code is:

``N = 1e6 y <- sapply( 1:N, function(i) {     diff( c(0,              sort(sample(seq(0.01, 1.09, by=0.01), 9)),             1.1) ) - 0.01 } ) mean(y) hist(y, breaks=seq(-0.005, 1.005, by=0.01), freq=FALSE,      main="Histogram of partial sums", xlab="Value") ``

Taking the mean and producing a histogram is suggested by @whuber and yields a mean of 0.1 and the following histogram, consistent with his answer.

On my latop (core-i7, 2.3Ghz), wrapping a `system.time` call around `y->...` gave an elapsed time of 124s, i.e. just over 2 minutes.

Rate this post