I have a 2×2 table with two independent groups of people that replied Yes or No in the survey:
Yes | No | |
---|---|---|
Group A | 350 | 1250 |
Group B | 1700 | 3800 |
Could you help to find a test that can be run on these figures to see if there is a statistical significance between the two groups if it exists?
Best Answer
BruceET provides one way of analyzing this table. There are several tests for 2 by 2 tables which are all asymptotically equivalent, meaning that with enough data all tests are going to give you the same anwer. I present them here with R code for posterity.
In my answer, I'm going to transpose the table since I find it easier to have groups as columns and outcomes as rows.
The table is then
Group A | Group B | |
---|---|---|
Yes | 350 | 1700 |
No | 1250 | 3800 |
I'll reference the elements of this table as
Group A | Group B | |
---|---|---|
Yes | $a$ | $b$ |
No | $c$ | $d$ |
$N$ will be the sum of all the elements $N = a+b+c+d$.
The Chi Square Test
Perhaps the most common test for 2 by 2 tables is the chi square test. Roughly, the null hypothesis of the chi square test is that the proportion of people who answer yes is the same in each group, and in particular it is the same as the proportion of people who answer yes were I to ignore groups completely.
The test statistic is
$$ X^2_P = dfrac{(ad-bc)^2N}{n_1n_2m_1m_2} sim chi^2_1$$
Here $n_i$ are the column totals and $m_i$ are the row totals. This test statistic is asymptotically distributed as Chi square (hence the name) with one degree of freedom.
The math is not important, to be frank. Most software packages, like R, implement this test readily.
m = matrix(c(350,1250, 1700, 3800), nrow=2) chisq.test(m, correct = F) Pearson's Chi-squared test data: m X-squared = 49.257, df = 1, p-value = 2.246e-12
The correct=F
is so that R implements the test as I have written it and does not apply a continuity correction which is useful for small samples. The p value is very small here so we can conclude that the proportion of people who answer yes in each group is different.
Test of Proportions
The test of proportions is similar to the chi square test. Let $pi_i$ be the probability of answering Yes in group $i$. The test of proportions tests the null that $pi_1 = pi_2$.
In short, the test statistic for this test is
$$ z = dfrac{p_1-p_2}{sqrt{dfrac{p_1(1-p_1)}{n_1} + dfrac{p_2(1-p_2)}{n_2}}} sim mathcal{N}(0,1) $$
Again, $n_i$ are the column totals and $p_1 = a/n_1$ and $p_2=b/n_2$. This test statistic has standard normal asymptotic distribution. If your alternative is that $p_1 neq p_2$ then you want this test statistic to be larger than 1.96 in absolute value in most cases to reject the null.
In R
# Note that the n argument is the column sums prop.test(x=c(350, 1700), n=c(1600, 5500), correct = F) data: c(350, 1700) out of c(1600, 5500) X-squared = 49.257, df = 1, p-value = 2.246e-12 alternative hypothesis: two.sided 95 percent confidence interval: -0.11399399 -0.06668783 sample estimates: prop 1 prop 2 0.2187500 0.3090909
Note that the X-squared
statistic in the output of this test is identical to the chi-square test. There is a good reason for that which I will not talk about here. Note also that this test provides a confidence interval for the difference in proportions, which is an added benefit over the chi square test.
Fisher's Exact Test
Fisher's exact test conditions on the quantites $n_1 = a+c$ and $m_1 = a + b$. The null of this test is that the probability of success in each group is the same, $pi_1 = pi_2$, like the test of proportions. The actual null hypothesis in the derivation of the test is about the odds ratio, but that is not important now.
The exact probability of observing the table provided is
$$ p = dfrac{n_1! n_2! m_1! m_2!}{N! a! b! c! d!} $$
John Lachin writes
Thus, the probability of the observed table can be considered to arise from a collection of $N$ subjects of whom $m_1$ have positive response, with $a$ of these being drawn from the $n_1$ subjects in group 1 and $b$ from among the $n_2$ subjects in group 2 ($a+b=m_1$, $n_1 + n_2 = N$).
Importantly, this is not the p value. It is the probability of observing this table. In order to compute the p value, we need to sum up probabilities of observing tables which are more extreme than this one.
Luckily, R does this for us
m = matrix(c(350,1250, 1700, 3800), nrow=2) fisher.test(m) Fisher's Exact Test for Count Data data: m p-value = 1.004e-12 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.5470683 0.7149770 sample estimates: odds ratio 0.6259224
Note the result is about odds ratios and not about probabilities in each group. It is also worth noting, again from Lachin,
The Fisher-Irwin exact test has been criticized as being too conservative because other unconditional tests have been shown to yield a smaller p value and thus are more powerful.
When the data are large, this point becomes moot because you've likely got enough power to detect small effects, but it all depends on what you're trying to test (as it always does).
Thus far, we have examined what are likely to be the most prevalent tests for this sort of data. The following tests are equivalent to the first two, but are perhaps less known. I present them here for completeness.
Odds Ratio
The odds ratio $widehat{OR}$ for this table is $ad/bc$, but because the odds ratio is bound to be strictly positive, it can be more convenient to work with the log odds ratio $log(widehat{OR})$.
Asymptotically, the sampling distribution for the log odds ratio is normal. This means we can apply a simple $z$ test. Our test statistic is
$$ Z = dfrac{log(widehat{OR}) – log(OR)}{sqrt{hat{V}(log(widehat{OR})}} $$.
Here, $hat{V}(log(widehat{OR}))$ is the estimated variance of the log odds ratio and is equal to $1/a + 1/b + 1/c + 1/d$.
In R
odds_ratio = m[1, 1]*m[2, 2]/(m[2, 1]*m[1, 2]) vr = sum(1/m) Z = log(odds_ratio)/sqrt(vr) p.val = 2*pnorm(abs(Z), lower.tail = F)
which returns a Z value of -6.978754 and a p value less than 0.01.
Cochran's test
The test statistic is
$$ X^2_u = dfrac{dfrac{n_2a-n_1b}{N}}{dfrac{n_1n_2m_1m_2}{N^3}} sim chi^2_1 $$
In R
m = matrix(c(350,1250, 1700, 3800), nrow=2) a = 350 b = 1700 c = 1250 d = 3800 N = a+b+c+d n1 = a+c n2 = b+d m1 =a+b m2 =c+d X = ((n2*a-n1*b)/N)^2 /((n1*n2*m1*m2)/N^3) # Look familiar? X >>>49.25663 p.val = pchisq(X,1, lower.tail=F) p.val >>>[1] 2.245731e-12
Conditional Mantel-Haenszel (CMH) Test
The CMH Test (I think I've seen this called the Cochran Mantel-Haenszel Test elsewhere) is a test which conditions on the first column total and first row total.
The test statistic is
$$ X^2_c = dfrac{left( a – dfrac{n_1m_1}{N} right)^2}{dfrac{n_1n_2m_1m_2}{N^2(N-1)}} sim chi^2_1$$
In R
a = 350 b = 1700 c = 1250 d = 3800 N = a+b+c+d n1 = a+c n2 = b+d m1 =a+b m2 =c+d top =( a - n1*m1/N)^2 bottom = (n1*n2*m1*m2)/(N^2*(N-1)) X = top/bottom X >>>49.24969 p.val = pchisq(X, 1, lower.tail = F) p.val >>> [1] 2.253687e-12
Likelihood Ratio Test (LRT) (My Personal Favourite)
The LRT compares the difference in log likelihood between a model which freely estimates the group proportions and a model which only estimates a single proportion (not unlike the chi-square test). This test is a bit overkill in my opinion as other tests are simpler, but hey why not include it? I like it personally because the test statistic is oddly satisfying and easy to remember
The math, as before, is irrelevant for our purposes. The test statistic is
$$ X^2_G = 2 log left( dfrac{a^a b^b c^c d^d N^N}{n_1^{n_1} n_2^{n_2} m_1^{m_1} m_2^{m_2}} right) sim chi^2_1 $$
In R with some applied algebra to prevent overflow
a = 350 b = 1700 c = 1250 d = 3800 N = a+b+c+d n1 = a+c n2 = b+d m1 =a+b m2 =c+d top = c(a,b,c,d,N) bottom = c(n1, n2, m1, m2) X = 2*log(exp(sum(top*log(top)) - sum(bottom*log(bottom)))) # Very close to other tests X >>>[1] 51.26845 p.val = pchisq(X, 1, lower.tail=F) p.val >>>1] 8.05601e-13
Note that there is a discrepancy in the test statistic for the LRT and the other tests. It has been noted that this test statistic converges to teh asymptotic chi square distribution at a slower rate than the chi square test statistic or the Cochran's test statistic.
What Test Do I Use
My suggestion: Test of proportions. It is equivalent to the chi-square test and has the added benefit of being a) directly interpretable in terms of risk difference, and b) provides a confidence interval for this difference (something you should always be reporting).
I've not included theoretical motivations for these tests, though understanding those are not essential but captivating in my own opinion.
If you're wondering where I got all this information, the book "Biostatsitical Methods – The Assessment of Relative Risks" by John Lachin takes a painstakingly long time to explain all this to you in chapter 2.