Suppose I have N probabilities $(p_1, p_2,…,p_N)$ that represent the chance that each that a corresponding test was passed. How do I apply the Bernoulli distribution to determine the expected number of passes?
Best Answer
If you add up $n$ Bernouilli random variables with a different success probabilities then the sum has a Binomial Distribution of Poisson.
If the success probabilities are $p_i$ then the mean of the Binomial of Poisson is $n bar{p}$ where $bar{p}=frac{sum_i p_i}{n}$ and the variance of the Binomial distribution of Poisson is $n bar{p} (1-bar{p}) – n times var(p)$ (Note: for $var(p)$ there must be $n$ in the denominator!).
Note that if $var(p)$ is relatively small then the variance reduces to a binomial variance, which was expected because a small $var(p)$ means that the success probabilities of the Bernouillis are more or less equal.
I have some code to simulate this:
# Function that simulates Poisson Binomial random variable # # parameter 'ps' contains the success probabilities of the Bernouilli's to add up # parameter n.sim is the number of simulations # # The return value is a list containing # - the simulated mean # - the simulated variance # - the 'true' mean of Poisson Binomial namely n x average(ps) # - the true variance of Poisson Binomial namely n x average(ps) x (1-average(ps)) - n var(ps) simulate.Poisson_Binomial<-function(ps, n.sim=100000) { sum.all<-vector(mode="numeric", length=n.sim) for ( i in 1:n.sim ) { # generate the random outcome for each Bernouilli random.outcome<-( runif(n=length(ps)) <= ps ) # count the number of successes sum.all[i]<-sum(random.outcome) } ret<-list(sim.mean=mean(sum.all), sim.var=var(sum.all), PoisBin.mean=length(ps)*mean(ps), PoisBin.var=length(ps)*mean(ps)*(1-mean(ps))-(length(ps)-1)*var(ps)) return(ret) } # Generate 50 Bernouilli success probabilities set.seed(1) N<-50 ps<-runif(n=N) # do the simulation simulate.Poisson_Binomial(ps=ps, n.sim=5e5)
In the return of the R-function there is (length(ps)-1)*var(ps)
this is because the R-function var()
has (n-1) in the denominator. so $-n times var(p)$ in the formula above should be 'translated' to -length(ps) * ( var(ps) * (length(ps)-1)/length(ps)
which becomes - var(ps) * (length(ps)-1)
See also this Intuitive explanation for dividing by $n-1$ when calculating standard deviation?
Similar Posts:
- Solved – Relationship between Poisson, binomial, negative binomial distributions and normal distribution
- Solved – When do you know when modeling something with a binomial distribution is better than a Poisson
- Solved – Expected Value and Most Likely Outcome
- Solved – Logistic regression: what is the link between the binomial family and the binomial distribution
- Solved – Finding the uncertainty in a Binomial probability estimate