I know that PCA can be obtained by eigendecomposition of the covariance matrix, and the covariance matrix $S$ is obtained by the equation: $S = X^TX $, where $X$ is the centered data matrix.
But I am a bit confused about the dimensions of the covariance matrix.
In some resources, they define the data matrix as: $X_{n times d}$ where n is the number of samples and d is the dimension. In other resources, it's the opposite: $X_{d times n}$. And this definitely yields covariance matrices with different dimensions, and also eigenvectors of different dimensions.
I am not sure what am I getting wrong, but I think I am missing something important here.
Best Answer
When $X$ is $ntimes d$, the scatter matrix (the scaled covariance) is $S=X^TX$. When it is $dtimes n$, $S=XX^T$. And, in the latter case, the rows of $X$ are mean-centered as opposed to the former.
The logic is always to calculate $$sum_{i=1}^n x_ix_i^T$$ where $x_i$ is one data sample of dimension $dtimes 1$.
Covariance matrix is typically estimated as $S/n$ or $S/(n-1)$, since it's just a scalar, in PCA it doesn't matter.
Similar Posts:
- Solved – Is PCA still done via the eigendecomposition of the covariance matrix when dimensionality is larger than the number of observations
- Solved – Confused about scatter matrix dimensions in linear discriminant analysis
- Solved – Understanding PCA – How to calculate scores
- Solved – eigendecomposition of a covariance matrix
- Solved – eigendecomposition of a covariance matrix