I would like to sample 1 million vectors from the set of vectors

$$mathbf{v} = {v_1,v_2,…,v_n } $$

subject to

- $sum_{i=1}^n v_i = 1$
- $forall i: v_i ge 0$
- each $v_i$ has at most 2 decimal places

As an example I have $n = 10$. Effectively, I am choosing to divide $100$ items into $10$ portion such that there is no minimum number of items that can be in any portion (meaning $0$ item in a portion is possible).

This is an interesting combinatorics (albeit elementary) problem. Suppose a rich man wants to divide his $100 billion wealth amongst his 10 sons. Suppose he wants each son to get at least $1 billion and the *i*th son should receive $n$ billion where $n$ is an integer greater than 0. Imagine he has stacked his money in $1 billion piles in a row with some gap between each pile. Then he would have 99 gaps between the piles. To divide his money he has to simply choose 9 cut points from the 99 gaps. So it's 99 choose 9. It general if he has $c billion and d sons, it would be c-1 choose d ways to divide his wealth. Now suppose he wants to divide his wealth so that there is no minimum and the *i*th son could get $n_i$ billion where $n_i$ is an integer greater than or equal to 0. How many ways are there to do this? He was clever and asked his friend to lend him $10 billion and he also stacked them into a row together with his other billions. Now there are 110 piles of $1 billion dollars in a row and there are 109 gaps. He chooses 9 cut points from these 109 gaps. Now he has divided $110 billion into 10 portions each with at least $1 billion. Now he retrieves $1 billion from each portion, and returns the $10 to his friend. Now the original $100 billion is divided into 10 piles in such a way that there is no minimum amount!

So $109$ choose $9$ would give me the combinations.

In R, `combn(a,b)`

would give me all the cuts points from all possible combinations "a choose b". However running `combn(109,9)`

is not possible on my computer.

In R, is there an efficient way to sample from the output of `combn`

?

**Contents**hide

#### Best Answer

There are a couple of good answers here already, but none which directly implement the problem as suggested in the question. The steps to do this are:

- Sample 9 out of the 109 numbers from 0.01 to 1.09:
`sample(seq(0.01, 1.09, by=0.01), 9)`

- Find the differences between them:
- Sort them:
`sort`

- Place them in a vector between the minimum and maximum possible values:
`c(0, ..., 1.1)`

- Difference:
`diff`

- Sort them:
- Remove 0.01 from each resulting value, so that 0s are possible:
`- 0.01`

To do this `N`

times, the code can be wrapped up in `sapply`

. The resulting code is:

`N = 1e6 y <- sapply( 1:N, function(i) { diff( c(0, sort(sample(seq(0.01, 1.09, by=0.01), 9)), 1.1) ) - 0.01 } ) mean(y) hist(y, breaks=seq(-0.005, 1.005, by=0.01), freq=FALSE, main="Histogram of partial sums", xlab="Value") `

Taking the mean and producing a histogram is suggested by @whuber and yields a mean of 0.1 and the following histogram, consistent with his answer.

On my latop (core-i7, 2.3Ghz), wrapping a `system.time`

call around `y->...`

gave an elapsed time of 124s, i.e. just over 2 minutes.

### Similar Posts:

- Solved – In R, how to sample from the output of combn(a,b) if the “a choose b” is too large
- Solved – Linear Regression with a small sample size
- Solved – Does the probability of rolling a ‘6’ in dice change over time
- Solved – Dilation Rate in Convolution Neural Network
- Solved – Normalize an array of numbers to specific range