What is the best (most simple and robust) test statistic to measure the overall degree of association (inter-dependence, correlation or covariance?) between multiple binary variables?
I have been looking at multiple regression, but I think this is too complex as it is used to model the actual relationship for prediction, rather than to measure the degree of correlation.
So let's say we have k binary (binomial) variables, and a sample size of n observations per variable, where each variable occurs (positive case) at a given frequency/probability f.
How would we measure the degree of correlation between these variables, and how does the p-value of that metric depend on n, k and f?
Best Answer
First, whatever you use it won't be correlation. Correlation is about two variables.
Second, there is no simple way to do this because "degree of association" is not easily defined with multiple variables.
Third, as @NickCox commented yesterday, some people do principal components analysis on binary data but 1) This isn't simple 2) It's a bit controversial and 3) It may not give you what you want.
Fourth, have you considered log-linear analysis? This is a sort of generalization of chi-square: It makes no assumption about a dependent variable.