In which cases can't I perform a Pearson's chi squared test because the outcome will be meaningless?
Edit: Is it still valid even in the case the distribution does not allow mean and variance?
Best Answer
The chi-square test is a general term for tests, many of which were first developed by Karl Pearson in the beginning of the 20th century. There are many tests that have an asymptotic chi-square distribution. So the answer to your question may depend a little on which chi-square tests you are referring to. I can answer for the contingency table and goodness of fit tests. The asymptotics does not work well in the case of "sparse cells" in the contingency table. This has been extensively studied by William Cochran and rules of thumb have been used. One of the common ones is to require that the expected number of cases in any one cell should be 5 or more under the null hypothesis. Others require that only a very small percentage of the cells should have expected count size under 5. Some people may also apply this to the observed cell frequencies. SAS checks using a rule like the ones I've described and provides a warning that the approximation may not be accurate when the check fails. For more details look at Agresti's book "Categorical Data Analysis" or this wikipedia article: http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test