I am investigating many different kinds of PCA versions, I am trying to find out whether PCR will apply to my analysis thus the question on use of PCR.
When doing a PCA, you are effectively choosing a new set of 'variables' that you know for all your observations. Their main property is that they maximize the variance-content in one dimension (first PC has the most,…), while being linear combinations of the original covariates. This is the way it works like a dimension reduction: if 3 PCs contain 99% of the variance delivered by 100 covariates, there is not much reason, it seems, to keep the 100 covariates.
PCR essentially does regression on a set of principal components. Initially it makes sense, and in quite a few cases it does work.
However, in this regard, it is useful to look at Fisher's interpretation of discriminant analysis: he poses the problem as finding the direction(s) where the between-classes variance is maximal wrt the within-class variance.
This is where PCA fails somewhat (or could fail): it finds the direction where the 'overall' variance in the covariates is maximal (a much simpler problem), and then hopes this discriminates well. So, there is some criticism on the method, but that must not stop it from working 🙂
In general, doing a clustering style algorithm on your covariates first, and then using the results for classification is not a practice I'd recommend: perhaps the strongest structure in the covariates alone is not the most efficient one for prediction of another variable.
- Solved – PCA: How to the first principal component both maximize variance AND define the line that most closely fits the data
- Solved – How to test the variance in timeseries
- Solved – Interpreting weights from Fisher linear discriminant analysis
- Solved – How to create a scatterplot in R using the plot function to control for covariates
- Solved – How to select the optimal number principal components in functional principal components analysis