Suppose we have two samples and we wish to determine if they are drawn from the same distribution, the samples A,B being composed of some integers say.
If we test this using a two-sample permutation test, specifically by looking at permutations where the differences in the means of the samples are as extreme as the difference observed: is there any reason to think that we can calculate the two-tailed p-value by looking at one tail and doubling the probability?
This is what it appears to say in my lecture notes, but I don't understand why we could assume that the tails are symmetrical (or why it doesn't entail that assumption). Explanations were not forthcoming.
Best Answer
The permutation distribution of your test statistic is not guaranteed to be symmetric, so you can't do it that way. Instead, you add both tails. In your case of two independent samples, the null hypothesis is that the two location parameters are equal. Assuming continuous distributions and equal spread in both groups, we have exchangeability under the null hypothesis. Test statistic $T$ is the difference in means, with $E(T) = 0$ under the null.
The value for $T$ in the original sample is $T_{text{emp}}$, and its values for the permutations $T^{star}$. $sharp(cdot)$ is short for "number of" something, e.g., $sharp(T^{star})$ is the number of permutation test statistics. Then the $p$-value for the two-sided hypothesis is $p_{text{ts}} = p_{text{left}} + p_{text{right}}$, where
$p_{text{left}} = frac{sharp(T^{star} , <= , text{min}(T_{text{emp}}, -T_{text{emp}}))}{sharp(T^{star})}$
$p_{text{right}} = frac{sharp(T^{star} , >= , text{max}(T_{text{emp}}, -T_{text{emp}}))}{sharp(T^{star})}$
(assuming we have the complete permutation distribution). Let's compare both approaches for the case of two independent samples when we can calculate the exact (complete) permutation distribution.
set.seed(1234) Nj <- c(9, 8) # group sizes DVa <- rnorm(Nj[1], 5, 20)^2 # data group 1 DVb <- rnorm(Nj[2], 10, 20)^2 # data group 2 DVab <- c(DVa, DVb) # data from both groups IV <- factor(rep(c("A", "B"), Nj)) # grouping factor idx <- seq(along=DVab) # all indices idxA <- combn(idx, Nj[1]) # all possible first groups # function to calculate test statistic for a given permutation x getDM <- function(x) { mean(DVab[x]) - mean(DVab[!(idx %in% x)]) } resDM <- apply(idxA, 2, getDM) # test statistic for all permutations diffM <- mean(DVa) - mean(DVb) # empirical stest statistic
Now calculate the $p$-values and validate the proposed solution with the implementation in R's coin
package. Observe that $p_{text{left}} neq p_{text{right}}$, so it matters which way you calculate $p_{ts}$.
> (pL <- sum(resDM <= min(diffM, -diffM)) / length(resDM)) # left p-value [1] 0.1755245 > (pR <- sum(resDM >= max(diffM, -diffM)) / length(resDM)) # right p-value [1] 0.1585356 > 2*pL # doubling left p-value [1] 0.351049 > 2*pR # doubling right p-value [1] 0.3170712 > pL+pR # two-sided p-value [1] 0.3340601 > sum(abs(resDM) >= abs(diffM)) / length(resDM) # two-sided p-value (more concise) [1] 0.3340601 # validate with coin implementation > library(coin) # for oneway_test() > oneway_test(DVab ~ IV, alternative="two.sided", distribution="exact") Exact 2-Sample Permutation Test data: DVab by IV (A, B) Z = 1.0551, p-value = 0.3341 alternative hypothesis: true mu is not equal to 0
P.S. For the Monte-Carlo case where we only sample from the permutation distribution, the $p$-values would be defined like this:
$p_{text{left}} = frac{sharp(T^{star} , <= , text{min}(T_{text{emp}}, -T_{text{emp}})) + 1}{sharp(T^{star}) , + , 1}$
$p_{text{right}} = frac{sharp(T^{star} , >= , text{max}(T_{text{emp}}, -T_{text{emp}})) +1 }{sharp(T^{star}) , + , 1}$
$p_{text{ts}} = frac{sharp(text{abs}(T^{star}) , >= , text{abs}(T_{text{emp}})) , + , 1 }{sharp(T^{star}) + 1}$
The reason for adding one more extreme permutation case intuitively is that we need to count the empirical sample as well. Otherwise, the permutation $p$-value could be 0 which cannot happen in the continuous case (see here, note: some texts recommend this correction, some don't).
Similar Posts:
- Solved – How to test whether the absolute change of two paired samples different from zero
- Solved – simple equivalence test version of the Kolmogorov–Smirnov test
- Solved – Which permutation test implementation in R to use instead of t-tests (paired and non-paired)
- Solved – Random permutation test for feature selection
- Solved – the proper way to report test statistics used in a permutation analysis