Solved – Two-sample $t$-test vs Tukey’s method

Even if the $F$ statistic in an ANOVA test is significant, we may need to do further testing before drawing conclusions. The most common method for doing this involves the use of a multiple comparisons procedure. I am trying to understand why Tukey's method is used over the two-sample $t$-test. I read the following below which suggested that doing all the pairwise tests (presumably it means two-sample $t$-tests) can lead to false positives (Type I errors). Why should this happen conceptually?

Tukey's procedure allows us to conduct separate tests to decide
whether $mu_i = mu_j$ for each pair of means in an ANOVA study of
$k$ population means. Like all multiple comparison procedures, Tukey's
method is based on the selection of a "family" significance leve,
$alpha$, that applies to the entire collection of pairwise hypothesis
tests. For example, when using the Tukey procedure with a significance
level of, say, 5%, we are assured that there is at most a 5% chance of
obtaining a false positive among the entire set of pairwise tests,
That is, there is at most a 5% chance of mistakenly concluding that
two population means differ when, in fact, they are equal. This is
very different from simply conducting all the pairwise tests as
individual tests, each at $alpha=0.05$, which can result in a high
probability of finding false positives among the pairwise tests.

It's simply that if (under the null hypothesis of no effects) there's a 5% chance, say, of a false positive in each one of 20 tests, say; there's a greater than 5% chance of a false positive in any one of those 20 tests. If the tests are independent, the change of getting a false positive in none is $95%^{20}=35.8%$, and so the chance of a false positive in at least one is $64.2%$—this is the family-wise error that multiple comparisons tests control. Of course pairwise comparisons of all means after ANOVA are not independent, and Tukey's test takes this into account.

Similar Posts:

Rate this post

Leave a Comment