Solved – the geometric relationship between the covariance matrix and the inverse of the covariance matrix

The covariance matrix represents the dispersion of data points while the inverse of the covariance matrix represents the tightness of data points. How is the dispersion and tightness related geometrically?

For example, the determinant of the covariance matrix represents the volume of the dispersion of data points. What does the determinant of the inverse of the covariance matrix represent? The determinant is related to volume, but I don't understand how to interpret the volume of the inverse of the covariance matrix (or the volume of the information matrix).

Similarly, the trace represents kind of the mean square error of the data points, but what does the trace of the inverse of the covariance matrix represent?

I don't quite understand how to interpret the inverse of the covariance matrix geometrically, or how it is related to the covariance matrix.

Before I answer your questions, allow me to share how I think about covariance and precision matrices.

Covariance matrices have a special structure: they are positive semi-definite (PSD), which means for a covariance matrix $Sigma$ of size $mtext{x}m$, there are vectors $x$ of size $mtext{x}1$ such that $x^TSigma xgeq0$.

Such matrices enjoy a very nice property: they can be decomposed as $Sigma=RLambda R^T$, where R is a rotation matrix, and $Lambda$ is a diagonal matrix.

Now that we have the definition out of the way, let's take a look at what this means with the help of a $Sigma$ of size 2×2 (i.e. our dataset has two variables). In the image below, we see in figure a, an identity covariance matrix, which implies no correlation between the data variables. This can be drawn as a circle. Below the image, we see an identity covariance matrix decomposed into its $Sigma=RLambda R^T$ form.

In figure b, we see what happens to the geometry if we scale the variance of variables by two different factors. The variables are still uncorrelated, but their respective variances are now m, and n, respectively. Now how do we introduce correlation into the mix? We rotate the ellipse with the help of the rotation matrix, which for the figure c is simply:

$R = begin{bmatrix} cos(theta) & sin(theta)\ -sin(theta) & cos(theta) end{bmatrix}$

Rotation matrices have a nice property: they are orthonormal and $RR^T=1 therefore R^T=R^{-1}$

After that digression, lets come back to our covariance matrix. For $Sigma$: $Sigma = RLambda R^T = begin{bmatrix} R_{11} & R_{12}\ R_{21} & R_{22} end{bmatrix} begin{bmatrix} lambda_1 & 0\ 0 & lambda_2 end{bmatrix} begin{bmatrix} R_{11} & R_{21}\ R_{12} & R_{22} end{bmatrix}$

Now some fun facts: $det(Sigma)=prod_{i}lambda_i=lambda_1lambda_2$ and $tr(Sigma)=sum_{i}lambda_i=lambda_1+lambda_2$. Here is the kicker: $R$ actually consists of eigenvectors of $Sigma$ and $lambda_i$ are the eigenvalues.

Finally, note that $Sigma^{-1}$ is also PSD with the following decomposition: $Sigma^{-1} = (RLambda R^T)^{-1} = (Lambda R^T)^{-1}(R)^{-1}=(R^T)^{-1}Lambda^{-1}R^{-1}=RLambda^{-1}R^T$, in the last simplification, we made use of $RR^T=1$.

Furthermore: $Lambda^{-1} = begin{bmatrix} frac{1}{lambda_1} & 0\ 0 & frac{1}{lambda_2} end{bmatrix}$, that is, we simply take inverse of elements along diagonals!

With this information, we are now ready to answer your questions!

enter image description here

How is the dispersion and tightness related geometrically?

Dispersion gives you a sense of the area of the ellipse compared to that of the circle, tightness is the inverse of dispersion. Dispersion tells you how much area change happens to the unit circle (with uncorrelated variables and identity eigenvectors), tightness tells you how much area you have to undo in the ellipse so it ends up being unit variance.

What does the determinant of the inverse of the covariance matrix represent?

Since $Lambda^{-1} = begin{bmatrix} frac{1}{lambda_1} & 0\ 0 & frac{1}{lambda_2} end{bmatrix}$, the determinant of precision matrix ($frac{1}{lambda_1lambda_2}$) tells you how much area change you have to undo on your data variance so you end up with unit variance. Recall that $det(Sigma)=lambda_1lambda_2$.

What does the trace of the inverse of the covariance matrix represent?

Its equal to $lambda_1^{-1}+lambda_2^{-1}$. Geometric interpretation of $tr(Sigma^{-1})$ is less clear.

Similar Posts:

Rate this post

Leave a Comment