The *rule of three* is a method for calculating a 95% confidence interval when estimating $p$ from a set of $n$ IID Bernoulli trials with no successes.

My understanding from its derivation is that the confidence interval it produces, $[0,frac{3}{n}]$, is the interval which contains those values of $p$ such that,

$$text{Pr}big(sum_{i=1}^n X_i=0big)geq0.05.$$

That is to say, those values of $p$ such that the chance of having no successes in $n$ trials is greater than or equal to 5%.

However in practice this approach does not produce confidence intervals which contain the true parameter 95% of the time.

For instance, suppose that $p=frac{3}{n}+varepsilon$. Then 5% of the time you will use the *rule of three* and your confidence interval will not contain $p$, and 95% of the time you will apply the standard CI prodecure using the sample standard deviation, and this confidence interval will contain $p$ approximately 95% of the time.

Hence we have the probability of your confidence interval containing $p$ given by,

$$0.05 cdot 0 + 0.95^2 = 0.9025 neq0.95.$$

Am I misunderstanding the *rule of three*? My analysis shows that it does not function as a rule for producing a 95% confidence interval.

**Contents**hide

#### Best Answer

The image below is how I look at confidence intervals. It is an adaptation from an image in the answer to the question 'The basic logic of constructing a confidence interval', which is itself an adaptation of "The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial C. J. Clopper and E. S. Pearson Biometrika Vol. 26, No. 4 (Dec., 1934), pp. 404-413"

For instance, suppose that $p=frac{3}{n}+varepsilon$. Then 5% of the time you will use the

rule of threeand your confidence interval will not contain $p$, and 95% of the time you will apply the standard CI prodecure using the sample standard deviation, and this confidence interval will contain $p$ approximately 95% of the time.

**One-sided boundaries**The situation of the rule of three according to the derivation on Wikipedia is more close to the image on the right, which is for

*one-sided*intervals. The boundary $3/n$ is the situation when you will observe less than 5% of the time zero successes. If the true value would be $3/n+epsilon$ then you will make this error less than 5% of the time. In the other cases observations 1, 2, etc. You will always make the correct boundary if you consider one-sided boundaries. (you argue that in 95% of the cases you will be using the other interval and make 95% error in those cases, that is not true)**Two-sided boundaries**For the situation on the right the confidence interval for 0 observations is not $3/n$. For two sided intervals one computes the boundaries such you have in 5% probability (at most) to end up at

*both*ends together. In the case of equal tails/sides then the boundary is computed such that $(1-p)^n leq 0.025$, from which follows $$p approx -log(1-p) = -log(0.025)/n approx 3.7/n$$

In the example above for $n = 100$ you see that the $3.7/n$ works well. You see also a problem with these boundaries in the case of binomial distributions. Due to the discreteness the method will not give exactly 95% probability for every value of $p$. Below is a plot for the coverage probability of the intervals as function of the parameter value

To be complete, we also plot Jeffreys' confidence interval, which BruceET mentions in his answer. It evens out the probabilities and makes the confidence interval smaller for $p$ close to $0$ and $1$. Conditional on the true value of the parameter, the Jeffreys' interval does *not* always cover the parameter with at least 95% probability, but it is also not designed to do that.

`p_cover <- function(p, type = 1) { n = 100 k = 0:n if (type == 1) { ### Clopper Pearson p_upper = qbeta(1-0.025,k+1,n-k) p_lower = qbeta(0.025,k,n-k+1) } else { ### Jeffreys' p_upper = qbeta(1-0.025,k+0.5,n-k+0.5) p_lower = qbeta(0.025,k+0.5,n-k+0.5) } ks <- which((p <= p_upper)*(p >= p_lower)==1) sum(dbinom(ks-1,n,p)) } p_cover <- Vectorize(p_cover) ps <- seq(0,1,0.0001) plot(ps,p_cover(ps), type = "l", ylim = c(0.9,1), xlab = "true value of p", ylab = "cover probability", main = "probability 95% Clopper Pearson confidence interval covers true value of p", cex.main = 1) lines(c(0,1),c(0.95)*c(1,1), col = 2) plot(ps,p_cover(ps, type = 2), type = "l", ylim = c(0.9,1), xlab = "true value of p", ylab = "cover probability", main = "probability 95% Jeffreys' confidence interval covers true value of p", cex.main = 1) lines(c(0,1),c(0.95)*c(1,1), col = 2) `

### Similar Posts:

- Solved – Is calculating “actual coverage probability” the same thing as calculating a “credible interval”
- Solved – Is 95% specific to the confidence interval in any way
- Solved – Is 95% specific to the confidence interval in any way
- Solved – Why don’t we average Confidence Intervals
- Solved – confusion regarding confidence interval of normal distribution