# Solved – Contingency table (2×4) – right test & confidence intervals

I have 2 categorical variables for each observation in my dataset: environment and behaviour. I'm trying to test association between them i.e. does the environment affect behaviour? The resulting contingency table is:

``        Behv1    Behv2  Behv3   Behv4 Env 1   54        15      16       0 Env 2   739       201     13       39 ``

I am not sure what statistical test I should use? Normally for association I should use chi-square but some of the cell values are too small, so I think I should Fisher's exact test or is any other that suits best?

Also, I need to report the confidence intervals resulting from the test. How should I do this?

Thanks!

Contents

Out of curiosity I ran the chi-square test (in Stata, but any statistical environment should be up to the task!):

``. tabchii 54 15 16 0  739 201 13 39, p            observed frequency           expected frequency           Pearson residual  ----------------------------------------------           |                col                       row |       1        2        3        4 ----------+-----------------------------------         1 |      54       15       16        0           |  62.586   17.047    2.289    3.078           |  -1.085   -0.496    9.063   -1.754           |          2 |     739      201       13       39           | 730.414  198.953   26.711   35.922           |   0.318    0.145   -2.653    0.514 ----------------------------------------------  2 cells with expected frequency < 5           Pearson chi2(3) =  94.0651   Pr = 0.000 likelihood-ratio chi2(3) =  51.5291   Pr = 0.000  . ret li  scalars:                   r(N) =  1077                   r(r) =  2                   r(c) =  4                r(chi2) =  94.06510751973497                   r(p) =  2.93238684628e-20             r(chi2_lr) =  51.52908203554749                r(p_lr) =  3.77366847284e-11 ``

I note that

1. Any problem with low expected frequencies is relatively slight. Some texts oversell a old rule-of-thumb that you should worry even if expected frequencies drop below 5, but my experience matches a rule-of-thumb (to be found in Harold Jeffreys, Theory of Probability Oxford University Press, 1961, among other places) that below 1 is the only common danger zone. Here no expected frequency is that low.

2. Here two flavours of chi-square statistic are not close, which shows some sensitivity, but the choice between P-values of the order of $$10^{-20}$$ and $$10^{-11}$$ is scientifically no choice at all. (The exact test confirms overwhelming significance.)

3. In addition to testing the hypothesis — the answer from any test is an overwhelming Yes! There is an effect! — the more interesting question would seem to be what you can learn from the data. The Pearson residuals (observed $$-$$ expected) / root of expected flag up some fine structure for behaviours 3 and 4.

Rate this post