# Solved – Correlation multinomial distribution

Problem 1.14 from Categorical Data Analysis
2nd
.

For the multinomial distribution, show that
\$\$operatorname{corr}(n_j,n_k)=frac{-pi_jpi_k}{sqrt{pi_j(1-pi_j)pi_k(1-pi_k)}}\$\$
Show that \$operatorname{corr}(n_1,n_2)=-1\$ when \$c=2\$.

The multinomial density is
\$\$p(n_1,n_2,dots,n_{c-1})=binom{n!}{n_1!,dots,n_c!}pi_1^{n_1}dotspi_c^{n_c}\$\$
Let \$n_j=sum_i y_{ij}\$ where each \$y_{ij}\$ is Bernoulli with \$E[y_{ij},y_{ik}]=0\$, \$E[y_{ij}]=pi_j\$ and \$E[y_{ik}]=pi_k\$

Then
\$sum_j n_j=n\$, with dimension \$(c-1)\$ since \$n_c=n-(n_1+n_2+,dots,+n_{c-1})\$. So each \$n_jsim Bin(n,pi_j)\$

\$\$begin{cases}E[n_j]=npi_j\ operatorname{Var}(n_j)=frac{pi_j(1-pi_j)}{n}end{cases}\$\$
then

\$\$operatorname{corr}(n_j,n_k)=frac{-npi_jpi_k}{sqrt{npi_j(1-pi-pi_j)npi_k(1-pi_k)}}=frac{-pi_jpi_k}{sqrt{pi_j(1-pi_j)pi_k(1-pi_k)}}.\$\$

Is that right? How I can show the second part?

Contents

The probability generating function is

\$\$eqalign{ f(x_1,ldots, x_c) &= sum_{k_1, ldots, k_c} Pr((X_1,ldots,X_c)=(k_1,ldots, k_c)) x_1^{k_1}cdots x_c^{k_c}\ &= sum_{k_1,ldots,k_c} binom{n}{k_1cdots k_c} (pi_1 x_1)^{k_1}cdots (pi_c x_c)^{k_c} \ &= (pi_1 x_1 + cdots + pi_c x_c)^n.tag{1} }\$\$

The first equality is the definition if the pgf, the second is the formula for the Multinomial distribution, and the third one generalizes the Binomial Theorem (and often is taken to define the multinomial coefficients \$binom{n}{k_1cdots k_c}\$, whose values we do not need to know!).

Consequently (for \$n ge 2\$ and \$ine j\$) the expectation of \$X_iX_j\$ is

\$\$eqalign{mathbb{E}(X_iX_j) &= sum_{k_1, ldots, k_c} Pr((X_1,ldots,X_c)=(k_1,ldots, k_c)) k_i k_j\ &=left(x_i x_jfrac{partial^2}{partial x_i partial x_j}fright)(1,1,ldots,1) \ &= (1)(1)n(n-1)pi_i pi_j (pi_1 1 + cdots + pi_c 1)^{n-2} \ &= n(n-1)pi_i pi_j. }\$\$

The first equality is the definition of expectation, the second is the result of differentiating the preceding sum term-by-term, the third is the result of differentiating formula \$(1)\$ instead, and the fourth follows from the law of total probability, \$pi_1 + cdots + pi_c = 1\$.

(Obviously this formula for the expectation continues to hold when \$n=0\$ or \$n=1\$.)

Therefore (using a well-known formula for the covariance in terms of the first two moments and recognizing that \$mathbb{E}(X_k) = npi_k\$ for any \$k\$),

\$\$operatorname{Cov}(X_i, X_j) = mathbb{E}(X_iX_j) -mathbb{E}(X_i)mathbb{E}(X_j) = n(n-1)pi_ipi_j – (npi_i)(npi_j) = -npi_ipi_j.\$\$

The rest is easy algebra.

Rate this post