Solved – Can Bonferroni be applied for dependent multiple tests

Suppose we have 3 classes (A, B and C). Their members took a math test and a philosophy test.

What I want to know is which class performs better on math than the others, and which class performs better on philosophy. They task two different tests, but I want to discuss these subjects completely separately.

If there is no correlation between the scores of math test and those of philosophy test, I think we can simply apply Bonferroni correction. Since we have 3 classes, the significance level is 0.05/3. Then use t-test (or something similar), and say A and B got significantly different scores in the math test but the difference in the philosophy test is not significant, for instance. (Here we used t-test 6 times in total since we have to consider 2 subjects times 3 classes.) Is this idea correct?

What if there is a correlation, meaning students who are good at math tend to be bad at philosophy and vice versa? Do I have to lower the significance level? If I have to, what significance level should be used?

In order to show that the Bonferroni correction controls the familywise error rate you do not need to assume independence, so the type I error will be controlled familywise if you do the Bonferroni correction. That ''proof'' is based on the Boole inequality and that inequality holds in cases of dependence and independence.

So if your significance level is $alpha$ and you perform $n$ tests, independent or not, and each individual test is done at a significance of $alpha/n$ then the familywise error for all the tests will be controled at the level $alpha$, meaning that the probability of a type I error at the family level will be lower than or equal to $alpha$.

So it may be that the correction is ''conservative'', i.e. the familywise type I error could be stricly lower than $alpha$, or, so to say, ''you make too few'' type I errors.

At first glance this does not seem like a problem: who would one have a problem with having too few (type I) errors ?

Now there is a trade-off between the power of a test and the probability of a type I error; the lower the probability of a type I error, the higher the probability of a type II error and thus the lower the power of a test.

So if the Bonferroni correction is conservative, it will still control the familywise error at the level $alpha$ but at a levem strictly lower than $alpha$. As a too low type I error probability implies a loss of power, it follows that, in cases where the Bonferroni correction is conservative, you loose power !

It can be shown that the Bonferroni correction is conservative when the tests are dependent.

To conclude: you do not need independence for applying Bonferroni, it will still control the familywise error, but in the case of dependence between tests it will be conservative and in that case, even if the familywise type I error is controled, this results in a loss of power.

Note: The Holm procedure controls the type I error familywise in the same way as Bonferroni, the Holm procedure will also be conservative for dependent tests, but less (or equally) conservative as Bonferoni.

Note: The Sidak correction assumes independence.

EDIT 21-09-2016

A. FWER is applicable to your case

FWER is needed whenever you use one and the same sample to test a famliy of hypothesis. This is the case for you e.g. if you want to show that the classes perform different on math, you will have to test three hypothesises, i.e.

  1. $H_0^{(1)}: mu_{Am}=mu_{Bm}$ versus $H_1^{(1)}: mu_{Am} ne mu_{Bm}$
  2. $H_0^{(2)}: mu_{Am}=mu_{Cm}$ versus $H_1^{(2)}: mu_{Am} ne mu_{Cm}$
  3. $H_0^{(3)}: mu_{Bm}=mu_{Cm}$ versus $H_1^{(3)}: mu_{Bm} ne mu_{Cm}$

Where $mu_{ct}$ is the mean score of class $c$ on test $t$, so e.g. $mu_{Am}$ is the mean of class $A$ on maths.

There is no doubt that this is a Family of three hypothesises. If you perform each of these three hypothesises at a significance level of $alpha$ then your type I error will be larger than $alpha$, so in order to control type I error ''familywise'' you will have to do ''something'' to reduce it to the level of $alpha$.

Bonferroni is one possibility, but Bonferroni is about controlling the familywise error rate, so there is no doubt that it is applicable to your case. A nice introduction can be found at this link

B. Detailed analysis of your test

In your comment below you are more specific about what you want to do, I cite your comment:

You say: ''What I want to know is, for example, class A does better on math than B does. I will not conclude like class A is better at studying than B is because A got higher scores on at least one subject (either of the subjects or maybe both). Is this right?''

This reminds me at a discussion I had with @amoeba, @Wayne and @Anoldmaninthesea in this post What's wrong with ''multiple testing correction'' compared to ''joint tests''?.

My point is that you must precisely define what you want to test. .

If you want to check whether the three classes perform differently on math, then you should test $H_0: mu_{Am}=mu_{Bm}=mu_{cm}$ versus $H_1: mu_{Am} ne mu_{Bm} text{ or } mu_{Am} ne mu_{Cm} text{ or } mu_{Bm} ne mu_{Cm}$. If you know the joint distribution of the class scores on math, then you can do a joint test (see What's wrong with ''multiple testing correction'' compared to ''joint tests''?).

If you do not know that joint distribution then you can do a family of tests, if you control type I error rate !

The family of tests that you can perform is the three tests I mentioned supra. If you want to control FWER at the level $alpha$ then you should do a correction like e.g. Bonferroni.

However, according to your comment that I cited supra, you don't want to test whether the classes perform differently on math, but whether the classes perform differently in studying better, meaning that they perform better on math OR on philosophy. This implies a different test: $H_0: mu_{Am}=mu_{Bm}=mu_{Cm} text{ AND }mu_{Ap}=mu_{Bp}=mu_{Cp}$ versus …. (the opposite).

This can be replaced by a family of six hypothesises. If you have a family of six hypothesises then you should divide $alpha$ by six, as @Björn said in his comment below your question. However, if there is dependence amongst the tests, then dividing by six will lead to a conservative FWER control and to loss of power as I explained supra.

Why is that ? Well I think that is the reason for your question. If you would know the dependence between the results of math and philosophy and assume it is perfect, i.e. if the scores are between 0 and 10, then assume that the points of philosophy are 10-the points of math, so assume that there is perfect dependence.
In that case I know that if the classes score differently on maths then they will also score differently on philosophy (because of the dependence assumed), so I only have to do the test for maths (and this is a family of three tests).

If the dependence is not perfect, then you will have something in the middle.

But what holds is that, even in the case of dependence, the Bonferroni correction controls FWER at the level $alpha$, however, if there is dependence then this will result in a loss of power (meaning that you will reject too few null hypothesises if you divide by six).

Similar Posts:

Rate this post

Leave a Comment