# Solved – Understanding the Chi-squared test and the Chi-squared distribution

I am trying to understand the logic behind chi-squared test.

The Chi-squared test is \$chi ^2 = sum frac{(obs-exp)^2}{exp}\$. \$chi ^2\$ is then compared to a Chi-squared distribution to find out a p.value in order to reject or not the null hypothesis. \$H_0\$: the observations come from the distribution we used to created our expected values. For example, we could test if the probability of obtaining `head` is given by \$p\$ as we expect. So we flip 100 times and find \$n_H\$ `Heads` and \$1-n_H\$ `tails`. We want to compare our finding to what is expected (\$100 cdot p\$). We could as well use a binomial distribution but it is not the point of the question… The question is:

Can you please explain why, under the null hypothesis, \$sum frac{(obs-exp)^2}{exp}\$ follows a chi-squared distribution?

All I know about the Chi-squared distribution is that the chi-squared distribution of degree \$k\$ is the sum of \$k\$ squared standard normal distribution.

Contents

We could as well use a binomial distribution but it is not the point of the question…

Nevertheless, it is our starting point even for your actual question. I'll cover it somewhat informally.

Let's consider with the binomial case more generally:

$$Ysim text{Bin}(n,p)$$

Assume $$n$$ and $$p$$ are such that $$Y$$ is well approximated by a normal with the same mean and variance (some typical requirements are that $$min(np,n(1-p))$$ is not small, or that $$np(1-p)$$ is not small).

Then $$(Y-E(Y))^2/text{Var}(Y)$$ will be approximately $$simchi^2_1$$. Here $$Y$$ is the number of successes.

We have $$E(Y) = np$$ and $$text{Var}(Y)=np(1-p)$$.

(In the testing case, $$n$$ is known and $$p$$ is specified under $$H_0$$. We don't do any estimation.)

So if $$H_0$$ is true $$(Y-np)^2/np(1-p)$$ will be approximately $$simchi^2_1$$.

Note that $$(Y-np)^2 = [(n-Y)-n(1-p)]^2$$. Also note that $$frac{1}{p} + frac{1}{1-p} = frac{1}{p(1-p)}$$.

Hence $$frac{(Y-np)^2}{np(1-p)} = frac{(Y-np)^2}{np}+frac{(Y-np)^2}{n(1-p)}\ quad= frac{(Y-np)^2}{np}+frac{[(n-Y)-n(1-p)]^2}{n(1-p)} \ quad= frac{(O_S-E_S)^2}{E_S}+frac{(O_F-E_F)^2}{E_F}$$

Which is just the chi-square statistic for the binomial case.

So in that case the chi-square statistic should have the distribution of the square of an (approximately) standard-normal random variable.

Rate this post