# Solved – Revisiting the Rule of Three

The rule of three is a method for calculating a 95% confidence interval when estimating $$p$$ from a set of $$n$$ IID Bernoulli trials with no successes.

My understanding from its derivation is that the confidence interval it produces, $$[0,frac{3}{n}]$$, is the interval which contains those values of $$p$$ such that,

$$text{Pr}big(sum_{i=1}^n X_i=0big)geq0.05.$$

That is to say, those values of $$p$$ such that the chance of having no successes in $$n$$ trials is greater than or equal to 5%.

However in practice this approach does not produce confidence intervals which contain the true parameter 95% of the time.

For instance, suppose that $$p=frac{3}{n}+varepsilon$$. Then 5% of the time you will use the rule of three and your confidence interval will not contain $$p$$, and 95% of the time you will apply the standard CI prodecure using the sample standard deviation, and this confidence interval will contain $$p$$ approximately 95% of the time.

Hence we have the probability of your confidence interval containing $$p$$ given by,

$$0.05 cdot 0 + 0.95^2 = 0.9025 neq0.95.$$

Am I misunderstanding the rule of three? My analysis shows that it does not function as a rule for producing a 95% confidence interval.

Contents

The image below is how I look at confidence intervals. It is an adaptation from an image in the answer to the question 'The basic logic of constructing a confidence interval', which is itself an adaptation of "The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial C. J. Clopper and E. S. Pearson Biometrika Vol. 26, No. 4 (Dec., 1934), pp. 404-413"

For instance, suppose that $$p=frac{3}{n}+varepsilon$$. Then 5% of the time you will use the rule of three and your confidence interval will not contain $$p$$, and 95% of the time you will apply the standard CI prodecure using the sample standard deviation, and this confidence interval will contain $$p$$ approximately 95% of the time.

• One-sided boundaries

The situation of the rule of three according to the derivation on Wikipedia is more close to the image on the right, which is for one-sided intervals. The boundary $$3/n$$ is the situation when you will observe less than 5% of the time zero successes. If the true value would be $$3/n+epsilon$$ then you will make this error less than 5% of the time. In the other cases observations 1, 2, etc. You will always make the correct boundary if you consider one-sided boundaries. (you argue that in 95% of the cases you will be using the other interval and make 95% error in those cases, that is not true)

• Two-sided boundaries

For the situation on the right the confidence interval for 0 observations is not $$3/n$$. For two sided intervals one computes the boundaries such you have in 5% probability (at most) to end up at both ends together. In the case of equal tails/sides then the boundary is computed such that $$(1-p)^n leq 0.025$$, from which follows $$p approx -log(1-p) = -log(0.025)/n approx 3.7/n$$

In the example above for $$n = 100$$ you see that the $$3.7/n$$ works well. You see also a problem with these boundaries in the case of binomial distributions. Due to the discreteness the method will not give exactly 95% probability for every value of $$p$$. Below is a plot for the coverage probability of the intervals as function of the parameter value

To be complete, we also plot Jeffreys' confidence interval, which BruceET mentions in his answer. It evens out the probabilities and makes the confidence interval smaller for $$p$$ close to $$0$$ and $$1$$. Conditional on the true value of the parameter, the Jeffreys' interval does not always cover the parameter with at least 95% probability, but it is also not designed to do that.

``p_cover <- function(p, type = 1) {   n = 100   k = 0:n   if (type == 1) { ### Clopper Pearson      p_upper = qbeta(1-0.025,k+1,n-k)     p_lower = qbeta(0.025,k,n-k+1)   } else { ### Jeffreys'     p_upper = qbeta(1-0.025,k+0.5,n-k+0.5)     p_lower = qbeta(0.025,k+0.5,n-k+0.5)   }   ks <- which((p <= p_upper)*(p >= p_lower)==1)   sum(dbinom(ks-1,n,p)) } p_cover <- Vectorize(p_cover)  ps <- seq(0,1,0.0001)  plot(ps,p_cover(ps), type = "l", ylim = c(0.9,1), xlab = "true value of p",      ylab = "cover probability",      main = "probability 95% Clopper Pearson confidence interval covers true value of p",      cex.main = 1) lines(c(0,1),c(0.95)*c(1,1), col = 2)   plot(ps,p_cover(ps, type = 2), type = "l", ylim = c(0.9,1), xlab = "true value of p",      ylab = "cover probability",      main = "probability 95% Jeffreys' confidence interval covers true value of p",      cex.main = 1) lines(c(0,1),c(0.95)*c(1,1), col = 2) ``

Rate this post