# Solved – FDR (multiple testing correction) with skewed p-value distribution

I am confused with multiple comparisons adjustments. I have a \$p\$-values with lot of ones ( due to many scores in foreground are 0) from a fisher-exact test. I get some \$p\$-values which are significant without multiple testing correction. The \$p\$-value compose of 1000 \$p\$-values of

``  Min.   1st Qu.    Median      Mean   3rd Qu.      Max.  ``

0.0000013 0.2552000 0.6069000 0.5634000 0.8672000 .9900000

and 3000 \$p\$-values of 1. The \$p\$-values is present at https://dl.dropboxusercontent.com/u/2706915/pval.csv

If I remove all \$p\$-values=1 and perform multiple testing correction. I expected by adding these \$p\$-values=1; \$q\$-value will increase since distribution of \$p\$-value is shifting left. However, R-package pvalue functions are giving q-value=1 for all p-values. I cannot understand this behavior. The FDR assumes that p-value distribution is uniform that is not in my cases. What mistake I am making?.

Contents

A point by point response to your questions:

1. You do not say what kind of test-statistic your \$p\$-values apply to. If you are talking about continuous distributions, such as for t or z statistics, then technically all of your \$p\$-values are strictly less than 1, although some of them may be very close to 1.

2. You test a bunch of hypotheses, and some of them are significant (without multiple comparisons adjustments), and some of them are not. Great.

3. Generally, one does not need to remove any \$p\$-values prior to conducting multiple comparisons adjustments for step-wise adjustment procedures (although the FDR gives the same results for a given level of \$alpha\$). All but one adjusted \$p\$-value (i.e. \$q\$-values) will be always larger than the corresponding unadjusted \$p\$-value. Conversely, one can think of multiple comparisons adjustments as adjusting the rejection-probability (e.g. \$alpha\$), rather than adjusting \$p\$-values, and here all but one of the adjusted rejection probabilities are less than the nominal type 1 error rate. One advantage to working the math out this way is one never has to adjust \$p\$-values so that they are larger than/truncated at the value 1.

4. It sounds like, after adjustment for multiple comparisons using the FDR, you would not reject any hypotheses. This is a possibility (without seeing your vector of \$p\$-values it is not possible to show you the math).

5. The FDR does not assume a uniform distribution of \$p\$-values.

6. You are seemingly not making any mistake, other than being surprised by your results versus your expectations of your results.

Update: Have a look at this spreadsheet producing both adjusted alpha (i.e. the FDR), and alternatively adjusted \$p\$-values, for the 927 \$p\$-values in the spreadsheet you supplied.

Notice that: (1) column B contains the \$p\$-values <1 sorted largest to smallest; (2) column C contains the sorting order (\$i\$), (3) the adjusted \$frac{alpha}{2} = frac{0.05}{2}timesfrac{927+1-i}{927}\$, (4) the adjusted \$p\$-values \$=frac{927}{927+1-i}p_{i}\$, and finally, (5) you would reject the hypotheses corresponding to the two smallest \$p\$-values because (a) \$3.78times 10^{-5} < 5.39times 10^{-5}\$ (i.e. \$p_{926} < alpha_{926}^{*}\$), or alternately (b) \$0.0175 < 0.025\$ (i.e. \$q_{926} < frac{alpha}{2}\$).

Rate this post