# Solved – How to generate samples uniformly at random from multiple discrete variables subject to constraints

I would like to generate a Monte Carlo process to fill an urn with N balls of I colors, C[i]. Each color C[i] has a minimum and maximum number of balls which should be placed in the urn.

For example, I am trying to put 100 balls in the urn, and can fill it with four colors:

• Red – Minimum 0, Maximum 100 # NB, the actual maximum cannot be realized.
• Blue – Minimum 50, Maximum 100
• Yellow – Minimum 0, Maximum 50
• Green – Minimum 25, Maximum 75

How can I generate a N samples which is ensured to be uniformly distributed across possible outcomes?

I have seen solutions to this problem where the balls have no minimum or maximum, or alternatively, has the same implicit minima and maxima. See for example, this discussion on a slightly different subject:

Generate uniformly distributed weights that sum to unity?

But I'm having problems generalizing this solution.

Contents

Let \$n_i\$ denote the number of balls of color \$C_i\$. Also, let \$m_i\$ and \$M_i\$ denote the minimum and the maximum number of balls of color \$C_i\$, respectively. We want to sample \$(n_1, dots, n_I)\$ uniformly at random subject to the following constraints:

1. \$m_i leq n_i leq M_i\$
2. \$sum_{i=1}^I n_i = N\$

First of all, you can remove the lower bound constraints (i.e. \$m_i leq n_i\$) by picking \$m_i\$ balls of color \$C_i\$ initially. This modifies the two constraints as follows:

1. \$0 leq n_i leq b_i = M_i – m_i\$
2. \$sum_{i=1}^I n_i = B = N – sum_{i=1}^I m_i\$

Let \$P(n_1, dots, n_I mid B, b_{1:I})\$ denote the uniform distribution that we are interested in. We can use chain rule and dynamic programming to sample from \$P\$ efficiently. First, we use chain rule to write \$P\$ as follows: \$\$ begin{align} P(n_1, dots, n_I mid B, b_{1:I}) &= P(n_I mid B, b_{1:I}) P(n_1, dots, n_{I-1} mid n_I, B, b_{1:I}) \ &= P(n_I mid B, b_{1:I}) P(n_1, dots, n_{I-1} mid B-n_I, b_{1:I-1}) quad (1) end{align} \$\$ where \$P(n_I | B, b_{1:I}) = sum_{n_1, dots, n_{I-1}} P(n_1, dots, n_I | B, b_{1:I})\$ is the marginal distribution of \$n_I\$. Note that \$P(n_I | B, b_{1:I})\$ is a discrete distribution and can be computed efficiently using dynamic programming. Also, note that the second term in (1) can be computed recursively. We sample \$n_I\$ in the first round of the recursion, update the total number of balls to \$B – n_I\$ and recurse to sample \$n_{I-1}\$ in the next round.

The following is a Matlab implementation of the idea. The complexity of the algorithm is \$O(I times B times K)\$ where \$K = max_{i=1}^I b_i\$. The code uses randomly generated \$m_i\$s in each run. As a result, some of the generated test cases may not have any valid solution, in which case the code prints out a warning message.

``global dpm b  I = 5; % number of colors N = 300; % total number of balls  m = randi(50, 1, I)-1; % minimum number of balls from each from each color M = 99*ones(1, I); % maximum number of balls from each color  % print original constraints print_constraints(I, N, m, M, 'original constraints');  % remove the lower bound constraints b = M - m; B = N - sum(m); m = zeros(size(m));  % print transformed constraints print_constraints(I, B, zeros(1, I), b, 'transformed constraints');  % initialize the dynamic programming matrix (dpm) % if dpm(i, n) <> -1, it denotes the value of the following marginal probability % sum_{k=1}^{i-1} P(n_1, ..., n_i | dpm = -ones(I, B);  % sample the number of balls of each color, one at a time, using chain rule running_B = B;  % we change the value of "running_B" on the fly, as we sample balls of different colors for i = I : -1 : 1     % compute marginal distribution P(n_i)      % instead of P(n_i) we compute q(n_i) which is un-normalized.     q_ni = zeros(1, b(i) + 1); % possibilities for ni are 0, 1, ..., b(i)     for ni = 0 : b(i)         q_ni(ni+1) = dpfunc(i-1, running_B-ni);     end     if(sum(q_ni) == 0)         fprintf('Impossible!!! constraints can not be satisfied!n');         return;     end     P_ni = q_ni / sum(q_ni);     ni = discretesample(P_ni, 1) - 1;     fprintf('n_%d=%dn', i, ni);     running_B = running_B - ni; end ``

where the function print_constraints is

``function [] = print_constraints(I, N, m, M, note)     fprintf('n------ %s ------ n', note);     fprintf('%d <= n_%d <= %dn', [m; [1:I]; M]);     fprintf('========================n');     fprintf('sum_{i=1}^%d n_i = %dn', I, N); end ``

and the function dpfunc performs the dynamic programming computation as follows:

``function res = dpfunc(i, n)     global dpm b      % check boundary cases     if(n == 0)         res = 1;         return;     end     if(i == 0) % gets here only if n <> 0         res = 0;         return;     end      if(n < 0)         res = 0;         return;     end      if(dpm(i, n) == -1) % if <> -1, it has been compute before, so, just use it!         % compute the value of dpm(i, n) = sum_{n_1, ..., n_i} valid(n, n_1, ..., n_i)         % where "valid" return 1 if sum_{j=1}^i n_i = n and 0 <= n_i <= b_i, for all i         % and 0 otherwise.         dpm(i, n) = 0;         for ni = 0 : b(i)             dpm(i, n) = dpm(i, n) + dpfunc(i-1, n-ni);         end     end     res = dpm(i, n); end ``

and finally, the function discretesample(p, 1) draws a random sample from the discrete distribution \$p\$. You can find one implementation of this function here.

Rate this post