I have 2 categorical variables for each observation in my dataset: environment and behaviour. I'm trying to test association between them i.e. does the environment affect behaviour? The resulting contingency table is:
Behv1 Behv2 Behv3 Behv4 Env 1 54 15 16 0 Env 2 739 201 13 39
I am not sure what statistical test I should use? Normally for association I should use chi-square but some of the cell values are too small, so I think I should Fisher's exact test or is any other that suits best?
Also, I need to report the confidence intervals resulting from the test. How should I do this?
Thanks!
Best Answer
Out of curiosity I ran the chi-square test (in Stata, but any statistical environment should be up to the task!):
. tabchii 54 15 16 0 739 201 13 39, p observed frequency expected frequency Pearson residual ---------------------------------------------- | col row | 1 2 3 4 ----------+----------------------------------- 1 | 54 15 16 0 | 62.586 17.047 2.289 3.078 | -1.085 -0.496 9.063 -1.754 | 2 | 739 201 13 39 | 730.414 198.953 26.711 35.922 | 0.318 0.145 -2.653 0.514 ---------------------------------------------- 2 cells with expected frequency < 5 Pearson chi2(3) = 94.0651 Pr = 0.000 likelihood-ratio chi2(3) = 51.5291 Pr = 0.000 . ret li scalars: r(N) = 1077 r(r) = 2 r(c) = 4 r(chi2) = 94.06510751973497 r(p) = 2.93238684628e-20 r(chi2_lr) = 51.52908203554749 r(p_lr) = 3.77366847284e-11
I note that
Any problem with low expected frequencies is relatively slight. Some texts oversell a old rule-of-thumb that you should worry even if expected frequencies drop below 5, but my experience matches a rule-of-thumb (to be found in Harold Jeffreys, Theory of Probability Oxford University Press, 1961, among other places) that below 1 is the only common danger zone. Here no expected frequency is that low.
Here two flavours of chi-square statistic are not close, which shows some sensitivity, but the choice between P-values of the order of $10^{-20}$ and $10^{-11}$ is scientifically no choice at all. (The exact test confirms overwhelming significance.)
In addition to testing the hypothesis — the answer from any test is an overwhelming Yes! There is an effect! — the more interesting question would seem to be what you can learn from the data. The Pearson residuals (observed $-$ expected) / root of expected flag up some fine structure for behaviours 3 and 4.
Similar Posts:
- Solved – Contingency table (2×4) – right test & confidence intervals
- Solved – Contingency table (2×4) – right test & confidence intervals
- Solved – Fisher exact test (9×2) vs pearson chi-square (SPSS)
- Solved – How IBM SPSS calculates exact p value for Pearson Chi-Square statistic
- Solved – Goodness-of-fit test for small sample size , and has only two categories