I have a dataset containing 45 observations. I want to sample times from this dataset, but with sample size equal to 35 each time. So each time I want to delete 10 datapoints from the original dataset. In total there are ${45}choose{10}$ possible ways to delete d points from the sample. I want all these possibilities to be exploited during the resampling.

Can someone help me to program this? I must add that the code ${45}choose{10}$ seems not to work in R because it exceeds the memory capacity.

Many thanks in advance!

**Contents**hide

#### Best Answer

It is rare in statistical analysis that such code is truly needed (and it is not needed for the Jackknife), but here (for the record) is a solution.

A reliable method to generate a non-duplicating collection of $k$-subsets of $n$ things is to associate a unique combination with each integer $x$ in the set ${0, 1, ldots, binom{n}{k}-1}$. This can be done by walking upwards in Pascal's Triangle back to its apex, exploiting the relation $binom{n}{k} = binom{n-1}{k-1} + binom{n-1}{k}$, and picking elements at each row where you move left (from $k$ to $k-1$).

Start with the triple $(x, n, k)$ with $0 le x lt binom{n}{k}$ and $0 le k le n$. When $x lt binom{n-1}{k-1}$, select the first of $n$ elements, decrement $x$ to $$x^prime = x – binom{n-1}{k-1},$$ and proceed to select $k-1$ out of $n-1$ elements based on the triple $(x^prime, n-1, k-1)$. Otherwise do not select the first of $n$ elements, do not change $x$, and proceed to select $k$ out of $n-1$ elements based on the triple $(x, n-1, k)$. In any event, if $n = k$, then pick all $k$ elements.

The proof that this works depends on some easily established invariants:

At the outset, $0 le x lt binom{n}{k}$.

Exactly $k$ elements will be picked.

You can check that the following code satisfies these invariants.

`choose.int <- function(x, n, k) { if(n <= k) return(rep(TRUE, k)) u <- choose(n-1, k-1) pick <- x < u if (pick) y <- choose.int(x, n-1, k-1) else y <- choose.int(x-u, n-1, k) return(c(pick, y)) } `

The proof that it generates all such subsets, without duplication, can be accomplished with an inductive argument on $n$.

Thus, to select (say) $1000$ unique $10$-subsets of $45$ things, obtain a random sample *without replacement* from the set ${0, 1, ldots, binom{45}{10}-1}$. Using `choose.int`

, convert each value $x$ in this set into a vector indicating which of the $45$ elements to select:

`n <- 45; k <- 10 sample <- sapply(sample.int(choose(n, k), 1000)-1, choose.int, n=n, k=k) `

Timing indicates about $10,000$ such subsets can be generated per second (because `R`

is not efficient with recursive functions).

Because this is a largish array, let's illustrate with smaller values. How about picking $6$ of the $10 = binom{5}{3}$ three-subsets of five things:

`n <- 5; k <- 3 sapply(sample.int(choose(n, k), 6)-1, choose.int, n=n, k=k) `

The output will vary with the random seed, but in one case it was

` [,1] [,2] [,3] [,4] [,5] [,6] [1,] TRUE TRUE FALSE TRUE TRUE FALSE [2,] TRUE FALSE TRUE TRUE FALSE TRUE [3,] FALSE FALSE TRUE TRUE TRUE TRUE [4,] TRUE TRUE FALSE FALSE FALSE TRUE [5,] FALSE TRUE TRUE FALSE TRUE FALSE `

Each column indicates which elements to include in the sample. They all have $3$ `TRUE`

values for the elements picked. No two of the columns are the same.

### Similar Posts:

- Solved – Probability of selecting exactly 2 members of a group of 7, out of 35 people, if 3 people are picked
- Solved – Overlapping sets (3) – significance test
- Solved – How the hypergeometric distribution sums to 1
- Solved – How the hypergeometric distribution sums to 1
- Solved – Rao-Blackwell unbiased estimator binomial distribution