Solved – Computing multiple correlation coefficient between first three PCA components and an additional variable

This is follow-up from a previous question: How to separate groups using PCA?.

I have $25$ normals and $12$ patients. For each of them I have a vector representing a spectrogram (length $2000$).

So I have a matrix $Z$ of size $[25times2000;12times 2000]$.

I calculate:

[coeffZ, score, latent, tsquared, explained, mu]=pca(Z); 

and bar(explained) shows me that the first $3$ PCs explain most of the variance.

I also have a behavioral score (from testing) for each of the subjects ($[37times 1]$). It was suggested that I see if the first 3 PCs can predict the behavioral score using multiple correlation coefficient. Specifically this: http://en.wikipedia.org/wiki/Multiple_correlation

Does this make sense? Does anybody have an idea of how I can implement this in Matlab?

Even though the question as it stands now is arguably off-topic, I will provide a quick answer.

There is no function in MATLAB to directly compute multiple correlation coefficient. In principle, you could either use multiple regression function regress() to get $R^2$ and obtain your correlation coefficient $R$ with a square root, or canonical correlation function canoncorr(), which reduces to multiple correlation if one of the datasets consists of only single variable.

However, in your particular case three PCs are uncorrelated (it is a property of PCA), so square of the multiple correlation of your behavioural score with the three PCs is simply equal to the sum of squares of usual correlations between the behavioural score and each of the three PCs. So you can compute these three correlation coefficients, square them, add them up, take the square root, and you are done.

Similar Posts:

Rate this post

Leave a Comment