I performed and plotted a kmeans analysis in R with the following commands:
km = kmeans(t(mat2), centers = 4) plotcluster(t(mat2), km$cluster) #from library(fpc)
Here is the result from the plot:
This question is related to a previous question: Previous Question
My data matrix has dimensions $291 times 31$ (after taking the transpose by t(mat2)
)
What I want to know, is how can I create a mapping from each row in the matrix to a 2D point in the plot? My idea is to get the $31$ dimensional coordinates for each point in the plot and then map and compute the 2D coordinates with discrproj()
.For example, I see that I should be able to find the 2D center points of all clusters by calling discrproj()
on the matrix given by km$centers
(which has dimensions $4 times 31$ and hence contains the coordinates for each cluster in $31$ dimensional space).
However, where is the data for the coordinates in $31$ dimensional space for every 2D point in the plot? Is this data just my $291 times 31$ data matrix? In summary:
- How can I create a mapping from each row in the $291 times 31$ data matrix to a 2D point in the plot?
- Where/what is the data for the coordinates in $31$ dimensional space for every 2D point in the plot
Best Answer
First, let's generate some example data and cluster it:
data <- rFace(1000) km <- kmeans(data, 6)
Now, we can use discrproj to find an appropriate projection that separates these clusters
dp = discrproj(data, km$clustering)
The result, dp
has several fields that are potentially useful. The field dp$proj
contains the coordinates of the original data points, projected onto our new space. This space has the same dimensionality as the original space, but the first two dimensions separate the clusters best (which is what plotcluster
actually displays)
Compare:
plot(dp$proj[,1], dp$proj[,2], pch=km$cluster+48, col=km$cluster) #+48 to get labels correct
with:
plotcluster(data, km$clustering)
Suppose you get some new points in your original space. You can project them into your new space using the basis vectors in dp$units
, like this:
newpts = newdata %*% dp$units[,1:2]
That should answer your first question. Unfortunately, I think the second part is effectively unanswerable because there are infinitely many points in the 31-d space that correspond to a given point in the 2D space.
Similar Posts:
- Solved – R getting 2D coordinates from kmeans
- Solved – R getting 2D coordinates from kmeans
- Solved – Machine Learning : Classification algorithm for very high dimensional data which is uniquely definable in a very small sub-space
- Solved – Within the context of a document term matrix, what exactly are x and y axis in kmeans clustering
- Solved – Interpretation of the final cluster centers (cluster analysis)