I am comparing observed counts with expected counts generated by assuming equal probability. My data, in R, are as follows:
All <- matrix(c(51, 51, 76, 26), nrow=2, ncol=2) All [,1] [,2] [1,] 51 76 [2,] 51 26
When I run the chi-square, these are my results:
chisq.test(All) Pearson's Chi-squared test with Yates' continuity correction data: All X-squared = 12.016, df = 1, p-value = 0.0005275
This makes sense, but when I do the calculations by hand in Excel, using the formula ((|O-E|-0.5)^2)/E
, I come up with a very different X2 value: 23.539.
I have triple checked the formula, and I know that my input is the same as in R (O=76, 26; E=51, 51).
What is going on? I have seen this question posed elsewhere (Exact formula Yates' correction in R), but there the discrepancy between R and Excel was solved by taking absolute value into account. I have already done that. Could the huge difference in X2 values really be the result of R using the smallest residual, instead of just 1/2 as I use in Excel?
Best Answer
When you call chisq.test
on a matrix, you're telling R you want to do a chi-square test of independence on a matrix of observed values.
What you appear to be trying to do is a chi-square goodness of fit test.
Yates correction is normally applied to chi-square tests of independence, rather than to goodness of fit tests (this is also the case in R).
[To perform a goodness of fit test on your data in R try prop.test(76,26+76)
]