# Solved – Confusion with false discovery rate and multiple testing (on Colquhoun 2014)

I have read this great paper by David Colquhoun: An investigation of the false discovery rate and the misinterpretation of p-values (2014). In essence, he explains why false discovery rate (FDR) can be as high as \$30%\$ even though we control for type I error with \$alpha=0.05\$.

However I am still confused as to what happens if I apply FDR control in the case of multiple testing.

Say, I have performed a test for each of many variables, and calculated the \$q\$-values using Benjamini-Hochberg procedure. I got one variable that is significant with \$q=0.049\$. I am asking what is the FDR for this finding?

Can I safely assume that in the long run, if I do such analysis on a regular basis, the FDR is not \$30%\$, but below \$5%\$, because I used Benjamini-Hochberg? That feels wrong, I would say that the \$q\$-value corresponds to the \$p\$-value in Colquhoun's paper and his reasoning applies here as well, so that by using a \$q\$-threshold of \$0.05\$ I risk to "make fool of myself" (as Colquhoun puts it) in \$30%\$ of the cases. However, I tried to explain it more formally and I failed.

Contents

It so happens that by coincidence I read this same paper just a couple of weeks ago. Colquhoun mentions multiple comparisons (including Benjamini-Hochberg) in section 4 when posing the problem, but I find that he does not make the issue clear enough — so I am not surprised to see your confusion.

The important point to realize is that Colquhoun is talking about the situation without any multiple comparison adjustments. One can understand Colquhoun's paper as adopting a reader's perspective: he essentially asks what false discovery rate (FDR) can he expect when he reads scientific literature, and this means what is the expected FDR when no multiple comparison adjustments were done. Multiple comparisons can be taken into account when running multiple statistical tests in one study, e.g. in one paper. But nobody ever adjusts for multiple comparisons across papers.

If you actually control FDR, e.g. by following Benjamini-Hochberg (BH) procedure, then it will be controlled. The problem is that running BH procedure separately in each study, does not guarantee overall FDR control.

Can I safely assume that in the long run, if I do such analysis on a regular basis, the FDR is not $$30%$$, but below $$5%$$, because I used Benjamini-Hochberg?

No. If you use BH procedure in every paper, but independently in each of your papers, then you can essentially interpret your BH-adjusted $$p$$-values as normal $$p$$-values, and what Colquhoun says still applies.

### General remarks

The answer to Colquhoun's question about the expected FDR is difficult to give because it depends on various assumptions. If e.g. all the null hypotheses are true, then FDR will be $$100%$$ (i.e. all "significant" findings would be statistical flukes). And if all nulls are in reality false, then FDR will be zero. So the FDR depends on the proportion of true nulls, and this is something that has be externally estimated or guessed, in order to estimate the FDR. Colquhoun gives some arguments in favor of the $$30%$$ number, but this estimate is highly sensitive to the assumptions.

I think the paper is mostly reasonable, but I dislike that it makes some claims sound way too bold. E.g. the first sentence of the abstract is:

If you use $$p=0.05$$ to suggest that you have made a discovery, you will be wrong at least $$30%$$ of the time.

This is formulated too strongly and can actually be misleading.

Rate this post