My ultimate goal is to run a cluster analysis on a data set with > 1 million records. The input variables for the cluster analysis will be the results of a Principal Component Analysis, as well as other variables not included in the PCA, for a total of maybe 10 variables input into the clustering (the variables I input into the PCA were all very highly correlated with one another while the other variables are not so I chose not to include them in the PCA).
#read data mydata <- read.csv('mydata.csv') #import library for robust methods because my data contained outliers library(rrcov) #run robust PCA method called PcaCov pcaR <- PcaCov(~., mydata, na.action=na.omit, center=TRUE, scale = TRUE, k=8) #look at results summary(pcaR) screeplot(pcaR) [email protected]
From the results, I have decided I would like to retain the first three components, which capture ~87% of the total variance in the dataset.
Now I want to extract/save/export these first three components for use in the cluster analysis with my other variables. How do I do this?
Best Answer
For each variable obtained by PCA you have a loading vector (for example $v=(1,-2,5,5)$ this vector define your new variable as combination of the original ones. $x_1-2x_2+5x_3+5x_4$. You can define a new matrix where the variables are obtained as the linear combination defined by the loadings obtained with PCA. So for example $z_1=x_1-2x_2+5x_3+5x_4$.
Similar Posts:
- Solved – How to export and use results of PCA from R
- Solved – How to export and use results of PCA from R
- Solved – Interpret the visualization of k-mean clusters
- Solved – How to draw a map of a cluster analysis results
- Solved – Difference between loadings and correlations between observed variables and factor saved scores in factor analysis