Solved – Plotting clusters in PAM

I do factor analysis with 4 clusters and 7 variables. The resulting diagram is :
enter image description here

What are the axes showing? I have 7 variables forming the clusters. So, how can the clusters be shown in this 2-D diagram instead of a 7-D diagram? What do "component 1" and "component 2" represent? How are they calculated? Any help is highly appreciated

The R PAM implementation and clusplot is doing a principal components analysis (PCA). This is essentially an analysis of the covariance matrix of your variables to determine how much of the variability in the data can be explained by a reduced dimensionality set of basis vectors that are not correlated. It's a way of detecting and removing correlation from your original set of data factors.

If most (>90% say) of the variation was in the first two principal components, using all of your 7 variables would not be adding much useful information (some of them may be correlated), and representing your data in terms of the principal components would be to your advantage in reducing the dimensionality (e.g from 7 original variables to 2 principal components).

The clusplot shows the 2 largest principal components as the X and Y axes, and plots your data points (green symbols) in terms of the value of the first and second principal component. It also shows the percentage of variability in the data that can be explained by the first 2 principal components.

The ellipses are the shape of the regions covered by your 4 clusters in terms of the principal components.

So the clusplot is a visualization of how separable or not your clusters are (lots of overlap between the ellipses means they are not very well separated, so the clustering did not work very well) and how much redundant information is in your data set.

I didn't cover all the issues of PCA and clustering in this answer but hopefully that will help start you in looking at more detailed references.

Similar Posts:

Rate this post

Leave a Comment