I am using R software (R commander) to cluster my data. I have a smaller subset of my data containing 200 rows and about 800 columns. I am getting the following error when trying kmeans cluster and plot on a graph:
'princomp' can only be used with more units than variables
I then created a test doc of 10 row and 10 columns whch plots fine but when I add an extra column I get te error again. Why is this? I need to be able to plot my cluster. When I view my data set after performing kmeans on it I can see the extra results column which shows which clusters they belong to.
Is there anything I am doing wrong, can I ger rid of this error and plot my larger sample?
Best Answer
The clustering itself has no problems with the p>n situation, however the visualization internally uses princomp
(which is incapable of handling p>n) to plot the similarity space projection.
You can't fix that, at most try to reproduce similar graph by obtaining similarity space projection with cmdscale(dist(...))
and coloring the points with clusters.