I want to test if two observations of nominal data accord to the same distribution. I am using the chi squared statistics to perform a chi squared **homogeneity test** and normalize the result with Cramer's $phi$.

Unfortunately, all the examples for performing a chi squared **homogeneity test** I could find (e.g. here) perform the test with two one-dimensional observations. The example after the link above, for example, compares boys and girls dependent on their viewing preferences. This makes two observations in the form $[x_0, ldots, x_n]$. However, I want to test observations in the form $[[x_0, ldots, x_n], ldots, [z_0, …, z_m]]$. I don't know, if "multidimensional" is the right term in this context. I figured it might be.

Could you explain to me, how a chi squared **homogeneity test** can be performed with multi-row observations? I know it works, because scipy provides unique values for two dimensional input lists.

`import scipy.stats as sps observation1 = [[95, 31, 20], [70, 29, 18]] observation2 = [[21, 69, 98], [54, 35, 11]] data = [observation1, observation2] print sps.chi2_contingency(data) `

The code above yields `(159.18016188570166, 4.772222443744986e-31, 7, array([[[ 69.44008748, 47.45072645, 42.53205358], [ 45.11526642, 30.82876539, 27.63310068]], [[ 76.04085626, 51.96125177, 46.57502446], [ 49.40378984, 33.75925639, 30.25982128]]]))`

where the first value is chi squared. Flattening the observations yields different values, so there must be a difference.

**How do you calculate $chi^2$ for determining the homogeneity of multidimensional observations?** Please note that I know how to calculate $chi^2$ on a multi-row contingency table. Instead I want to know how to perform a for the homogeneity of two contingency tables.

**Example:**

`Table 1 Table 2 outcome0 outcome1 outcome2 sum outcome0 outcome1 outcome2 action0 95 31 20 146 action0 21 69 98 action1 70 29 18 117 action1 54 35 11 sum 165 60 38 263 `

Question: Are these observations following the same distribution?

Scipy allows to determine the expected values (and $chi^2 $) of multi-dimensional observations. The expected values are returned as the last element of its call:

`Expected table 1 Expected table 2 outcome0 outcome1 outcome2 outcome0 outcome1 outcome2 action0 69.44 47.45 42.53 action0 76.04 51.96 46.58 action1 45.12 30.83 27.63 action1 49.40 33.76 30.26 `

The sum of the squared differences, divided by the expected value, yields the $chi^2$ value. Note, however, that the expected tables are not just the expected values from each individual table (quick proof for the first cell of table 1: $frac{146 cdot 165}{263} = 91.60$). Instead their calculation seems to be somehow dependent on each other.

**I'd like to know how the expected values depend on each other, so that I can calculate it myself.**

**Contents**hide

#### Best Answer

To analyze a multi-way contingency table, you use log-linear models. In truth, log-linear models are a special case of the Poisson generalized linear model, so you could do that, but log-linear models are more user-friendly. In Python, you may need to use the Poisson GLM, as I gather log-linear models may not be implemented. I will demonstrate the log-linear model using your data with R.

`library(MASS) tab = array(c(95, 31, 20, 70, 29, 18, 21, 69, 98, 54, 35, 11), dim=c(3,2,2)) tab = as.table(tab) names(dimnames(tab)) = c("outcomes", "actions", "observations") dimnames(tab)[[1]] = c("0", "1", "2") dimnames(tab)[[2]] = c("0", "1") dimnames(tab)[[3]] = c("1", "2") tab # , , observations = 1 # actions # outcomes 0 1 # 0 95 70 # 1 31 29 # 2 20 18 # # , , observations = 2 # actions # outcomes 0 1 # 0 21 54 # 1 69 35 # 2 98 11 `

Log-linear models are simply a series of goodness of fit tests. We can start with a (trivial) null model that assumes all cells have the same expected value:

`summary(tab) # Number of cases in table: 551 # Number of factors: 3 # Test for independence of all factors: # Chisq = 159.18, df = 7, p-value = 4.772e-31 `

The null is rejected. Next, we can fit a saturated model:

`m.sat = loglm(~observations*actions*outcomes, tab) m.sat # Call: # loglm(formula = ~observations * actions * outcomes, data = tab) # # Statistics: # X^2 df P(> X^2) # Likelihood Ratio 0 0 1 # Pearson 0 0 1 `

Naturally, this fits perfectly. At this point, we could build up from the null model seeing if additional terms improve the fit, or drop terms from the saturated model to see if the fit gets significantly worse. The latter is more convenient and is conventional. To see if the distribution of `outcomes`

by `actions`

differs as a function of the `observation`

, we need to drop the interactions between the `observations`

and the `actions * outcomes`

. If we also drop the marginal effect of `observations`

, we are testing if the mean count differs between the two levels of `observations`

. That may or may not be of interest to you, I don't know.

`m1 = loglm(~observations + actions*outcomes, tab) sum(tab[,,1]) # 263 sum(tab[,,2]) # 288 m2 = loglm(~actions*outcomes, tab) anova(m2, m1) # LR tests for hierarchical log-linear models # # Model 1: # ~actions * outcomes # Model 2: # ~observations + actions * outcomes # # Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)) # Model 1 126.4172 6 # Model 2 125.2825 5 1.134691 1 0.28678 # Saturated 0.0000 0 125.282534 5 0.00000 `

`Model 1`

has dropped a single degree of freedom from `Model 2`

(note that, confusingly, `Model 1`

$leftrightarrow$ `m2`

, and `Model 2`

$leftrightarrow$ `m1`

), but the decrease in model fit is very small. It is not significant. There is not enough evidence to suggest that the mean counts differ by `observation`

. On the other hand, when `Model 2`

is compared to the `Saturated`

model, the decrease in fit is highly significant. The data are inconsistent with the idea that the distribution of counts is the same in both levels of `observation`

.