In an online course, we are working through some linear discriminant analysis and I've been given an example. I am having trouble with the language used by the professor as it seems I am misunderstanding the assumptions regarding covariance.

The exact language for the example is as follows:

"Your data is from two classes and they are both Gaussian distributed. They have the same covariance but different means, as you can see from this picture (below)."

My question arises with respect to the covariance. Below is what I have written to the instructor:

Is there a way that two variables in a bivariate gaussian could NOT

have the same covariance? My understanding is that they would

necessarily have the same covariance since Cov(X,Y) = Cov(Y,X).Are we also assuming in this situation that each variable has similar

variance so that var(x) = var(y) as well (and as such, their

covariance matrices are exactly equal)? Or are we only assuming their

covariances to be equal, which must necessarily be true?

I generally would not post a reply from an instructor publicly but I am in great need of clarification. Here is the reply I received:

Here's a picture of a basic classification problem (below):

Is there a way that two variables in a bivariate gaussian could NOT have the same covariance?

I think maybe it's that you're not understanding what covariance we're

looking at?For example here we can see that these 2 groups have a different

amount of "spread".The bottom group is spread out more.

It should be clear visually that their covariances are not equal.

Are we also assuming in this situation that each variable has similar variance so that var(x) = var(y) as well

I think perhaps it's due to a misunderstanding of what covariance is.

In fact, the variances of each independent variable are just the

entries along the diagonal of the covariance matrix.has similar variance so that var(x) = var(y) as well (and as such, their covariance matrices are exactly equal)?

You have the implication backwards.

A valid sentence is: "their covariances are equal, therefore, the

variance of each component is equal"The reverse is not true.

I followed up with another question to the instructor but it seems I must be misunderstanding the meaning of the problem statement.

My understanding is that a covariance matrix is always symmetric. However, the entries along the DIAGONAL don't have to be equal, since the variances along the diagonal can all be unique. I can't tell if he's trying to say we are meant to assume that they have the exact same covariance matrix, which would imply that the VARIANCE of samples from each class would be equal.

I also note that the last statement in the reply very much confuses me. If two variables have different variances, but are independent, their covariances will both be zero, thus different variances can easily lead to the same covariance. But cov(x,y) = cov(y,x) always, so as I understand, equal covariances does not imply anything about equal variances.

Note: I have studied some LDA from other sources as well and believe that the VARIANCE of the two classes does need to be assumed unique. When he refers to the classes having the "same covariance," am I to understand this is to mean they have the exact same covariance MATRIX and, as such, must have exact same variance? If that is the case, can anyone illuminate what I'm misunderstanding about the definition or meaning of covariance?

**Contents**hide

#### Best Answer

Your confusion arises from the fact that there are two different populations on the same multidimensional space. To clarify, let's play with a concrete example.

We have two populations $mathcal{A}$ (people in Argentina) and $mathcal{B}$ (people in Brazil). Each is described using the two **same** features $X,Y$ ($X$ – height, $Y$ – weight).

Now, in general $Cov_{mathcal{A}}left(X,Yright) neq Cov_{mathcal{B}}left(X,Yright)$. That is, the relationship between height and weight in Argentina might be different than the relationship in Brazil. This case is what the instructor tried to emphasize. However, in the original question, we assume equality instead.

You should note that the **covariance matrix** for Argentina in our case is the following *symmetric* positive definite matrix: $$ Sigma_{mathcal{A}} = begin{pmatrix} Var_{mathcal{A}}left(Xright) & Cov_{mathcal{A}}left(X,Yright) \ Cov_{mathcal{A}}left(Y,Xright) & Var_{mathcal{A}}left(Yright)end{pmatrix} $$ Finally, it doesn't make much sense to talk about the covariance between population $mathcal{A}$ and population $mathcal{B}$. Covariance can be calculated only between random variables taken from the same multivariate distribution.

### Similar Posts:

- Solved – Confused about the visual explanation of eigenvectors: how can visually different datasets have the same eigenvectors
- Solved – Does the variance of a sum equal the sum of the variances
- Solved – Variance of a sum of identically distributed random variables that are not independent
- Solved – How to express a correlation matrix in terms of a covariance matrix
- Solved – identification SEM model