Solved – Does it make sense to use PCA when the determinant of the correlation matrix is (almost) zero

I'm running a PCA over a data set of $N times p$ size ($Napprox 1000$ being the number of measurements and $papprox 200$ being the number of dimensions/predictors).

I expect many of the predictors to be correlated and that the dimensions can consequently be reduced. I can even drop some columns that are linearly dependent with respect to the others.

When I run the PCA I find that $sim 50%$ of the variance can be explained by the first 5 PCs, suggesting that the predictors can actually be grouped.

But I am concerned about the smallness of the correlation matrix ($R$) determinant, which is $det(R) approx 10^{-100}$ or a ridiculous number like that.

Do the results make sense with such a small number?

Moreover, I see that the PCA results change (a lot!) if I round the input numbers to drop non-relevant digits, like the 10th digit or so. I think this is linked with the fact we are working with such a small determinant.

Since a small determinant in R indicates that there are redundant dimensions, I would say that the PCA is the way to go to reduce them. Nevertheless, does it make sense to run a PCA with such a small determinant? If not, what is the best way to reduce the dimensionality of the problem?

Having a very small $ det(R) $ only means that you have some variables that are almost linearly dependent. Note that $det(R)$ equals the product of the eigenvalues of $R$; so there is at least one eigenvalue that is approximately zero.

This only means that you have some extra/redundant dimensions in your dataset and that PCA will actually be able to represent 100% of the information with a smaller ($p_text{new} le p – 1$) set of dimensions.

Similar Posts:

Rate this post

Leave a Comment