# Solved – test/technique/method for comparing principal components decompositions between samples

Is there any methodical way to compare the directions, magnitudes, etc of PCA results for different samples drawn from the same population?

I'm leaving the nature of the test deliberately vague because I'd like to hear all the various possibilities… e.g. there might be (and I'm speculating here) a test comparing the sizes of the first principal components, or a test comparing the directions of the principal components, or there's some kind of distance measure between PCA outcomes and a test statistic for their equality.

As far as a use case goes, I don't have on in mind. Just out of curiosity, maybe as an exploratory technique.

Contents

So as far as I understood, you imagine that you have two clouds of \$n\$ points each, in a \$d\$-dimensional space; you do PCA on each cloud separately and then want to compare the PCA results between clouds, and to test for significant differences in some of the more important PCA features.

I don't think there are any standard tests for this purpose. For any specific question one can probably come up with some method or test, but your question is a bit too broad to try to come up with any possible tests.

Still, one general approach that comes to mind is to use permutation tests. Say, you want to test if PC1 in both sample sets ("clouds") are different. You can compute angle \$theta\$ between them. Then you pool all \$2n\$ points together in one big cloud, randomly split it into two clouds of size \$n\$ (this is usually called "shuffle the labels"), run two PCAs and compute \$theta\$ between two PC1s. Random splits can be performed many times (say, \$10:000\$ times), resulting in a distribution of \$theta\$ expected under a null hypothesis of no difference between clouds. Then you simply compare your actual \$theta\$ to this distribution and obtain a \$p\$-value.

The same approach can be used to compare e.g. largest eigenvalues. Or smallest eigenvalues. Or actually almost anything you want to compare.

Apart from that, if you want a test statistic for "equality of PCA outcomes" overall, then maybe you should simply use a test comparing two covariance matrices (without doing any PCA at all). E.g. Box's M-test (which is a multivariate generalization of a Bartlett's test for equality of variances).

Rate this post

# Solved – test/technique/method for comparing principal components decompositions between samples

Is there any methodical way to compare the directions, magnitudes, etc of PCA results for different samples drawn from the same population?

I'm leaving the nature of the test deliberately vague because I'd like to hear all the various possibilities… e.g. there might be (and I'm speculating here) a test comparing the sizes of the first principal components, or a test comparing the directions of the principal components, or there's some kind of distance measure between PCA outcomes and a test statistic for their equality.

As far as a use case goes, I don't have on in mind. Just out of curiosity, maybe as an exploratory technique.

So as far as I understood, you imagine that you have two clouds of \$n\$ points each, in a \$d\$-dimensional space; you do PCA on each cloud separately and then want to compare the PCA results between clouds, and to test for significant differences in some of the more important PCA features.

I don't think there are any standard tests for this purpose. For any specific question one can probably come up with some method or test, but your question is a bit too broad to try to come up with any possible tests.

Still, one general approach that comes to mind is to use permutation tests. Say, you want to test if PC1 in both sample sets ("clouds") are different. You can compute angle \$theta\$ between them. Then you pool all \$2n\$ points together in one big cloud, randomly split it into two clouds of size \$n\$ (this is usually called "shuffle the labels"), run two PCAs and compute \$theta\$ between two PC1s. Random splits can be performed many times (say, \$10:000\$ times), resulting in a distribution of \$theta\$ expected under a null hypothesis of no difference between clouds. Then you simply compare your actual \$theta\$ to this distribution and obtain a \$p\$-value.

The same approach can be used to compare e.g. largest eigenvalues. Or smallest eigenvalues. Or actually almost anything you want to compare.

Apart from that, if you want a test statistic for "equality of PCA outcomes" overall, then maybe you should simply use a test comparing two covariance matrices (without doing any PCA at all). E.g. Box's M-test (which is a multivariate generalization of a Bartlett's test for equality of variances).

Rate this post