So let's say I have a bunch of data points in R^n, where n is pretty big (like, 50). I know this data falls into 3 clusters, and I know which cluster each data point is a part of. All I want to do is visualize these clusters in 2D in such a way as to maximize the visual between-clusters separation that I see, with the goal being to prove that the clusters are easily separable based on the location of the data point in R^n alone.
The way I've been going about this up until now involves doing a PCA transform on the data points and then visualizing pairs of PCs at random until I find one where the clusters appear to be pretty cleanly separated. This approach seems pretty ad hoc though, and it seems like there should be an easy way to find a PCA-style rotation of the data that, instead of maximizing overall variance, maximizes inter-cluster separation.
Is there a standard technique out there that does this? If not, any ideas about how to create such a transformation?
Best Answer
"a PCA-style rotation of the data that, instead of maximizing overall variance, maximizes inter-cluster separation". Discriminant analysis is exactly such a technique. A principal component maximizes variance along it. A discriminant function maximizes ratio between-cluster-variance/pooled-within-cluster-variance along it.
Similar Posts:
- Solved – k-means and other non-parametric methods for clustering 1 dimensional data
- Solved – Validating Clustering by Considering the Ratio of Intra-cluster to Inter-cluster Distance
- Solved – Clustering data with equally sized clusters
- Solved – Alternative to Otsu for dividing data into two groups
- Solved – How should I determine what seed to use while using “flexclust” package