I have a p-value that I generate via resampling.
Resamples = 5000
Positive findings = 1000 positive findings
P-value = 1000/5000 = 0.2
How can I compute the 95% confidence interval for this p-value?
I would assume it's a function of the number of positive findings and resamples 1000 and 5000 in this case, respectively.
Is the answer p-value +- 1.96*sqrt(1000/5000 * (1 – 1000/5000) / 1000)?
Why this matters: Resampling is very expensive in my code and I'd like to stop resampling as soon as the 95% confidence interval for the boostrapped p-value doesn't include 0.05. Right now I'm doing a million resamplings to estimate every p-value and it is very slow.
Best Answer
If you want to minimize the number of samples, you are probably better off by estimating the $p$-value using (# positives + 1) / (# resamples + 1), see: (Davison and Hinkley 1997, chapter 4). In that case you can get a fine estimate of the Monte Carlo confidence interval using the 2.5th and 97.5th percentiles from the beta distribution with parameters # positives + 1, and # resamples + 1 – # positives. I discussed the logic behind using the beta distribution on pages 9 and 10 of this presentation.
Davison, A.C. and D.V. Hinkley (1997). Bootstrap methods and their application. Cambridge university press.
Similar Posts:
- Solved – Estimating bounds on false positives rate
- Solved – “Studentized” bootstrap confidence interval for variance of OLS error terms
- Solved – the BCa’s Confidence interval in confint
- Solved – Linear combination of coefficients after ‘survreg’ in R
- Solved – How to check permutation testing exchangeability assumption when using a General Linear Model