Given 2 square-symmetric covariance matrices whose sample sizes are not equal, can the following equation be used to compute the combined covariance ? I have been reading this article on wiki https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance but was wondering why the equation below is not right (if it is not right).

$ C_x = frac{C_aN_a + C_bN_b}{N_a+N_b} $

where C denotes the covariance and N the sample size.

Can anyone help me on this ?

**Contents**hide

#### Best Answer

It's a little unclear what you're asking here (the wikipedia page is for ordinary covariance but the question relates to covariance *matrices*) but I'll try to answer the question with regard to *unbiasedness*.

Assuming your covariance matrices are computed from the samples ${ textbf{a}_i }_{i=1}^{N_a}$ and ${ textbf{b}_j }_{j=1}^{N_b}$, the usual definition for the sample covariance matrix is $$ textbf{C}_a = frac{1}{N_a – 1} sum_{i=1}^{N_a} ( textbf{a}_i – bar{textbf{a}}) ( textbf{a}_i – bar{textbf{a}})^T, $$ and similarly for $textbf{C}_b$. Note that the denominator $N_a – 1$ makes the sample covariance matrix unbiased: $E[textbf{C}_a] = Cov(textbf{a}_i )$.

With this is mind, if you now compute the expected value of your proposed combined covariance you get: $$ E[textbf{C}_x] = frac{N_a}{N_a + N_b} Cov(textbf{a}_i) + frac{N_b}{N_a + N_b} Cov(textbf{b}_j).$$ By itself this is of little use but if we furthermore assume that the two samples come from populations with equal covariance matrices (as is often done, see e.g. Hotelling's $T^2$ test), that is, $ Cov(textbf{a}_i) = Cov(textbf{b}_j) = boldsymbol{Sigma}$, we then have $$E[textbf{C}_x] = boldsymbol{Sigma}. $$ Thus now $textbf{C}_x$ is unbiased for the common population covariance and what you proposed is indeed the ''correct'' way of combining the two estimators.

### Similar Posts:

- Solved – Deriving transition matrix from infinitesimal generator, continuous time Markov chain
- Solved – Mean of covariance matrices
- Solved – Mean of covariance matrices
- Solved – Deriving K-means algorithm as a limit of Expectation Maximization for Gaussian Mixtures
- Solved – Linear Regression of Indicator Matrix: sum of predictions is 1