I am currently working on a modification of a clustering algorithm to suit my problem domain.
I want to know which methods are available for me to compare the centroids generated from the two methods?
That is, I want to know how well my (modified) clustering method agrees with the current method, the current method being the gold standard.
Would highly appreciate any help.
Best Answer
It depends on what clustering method you are using. There are many possible different definitions of what makes a clustering "good", and we could easily conceive of many datasets where there are more than 1 possible set of "good" clusters that are very different than each other.
Since you have your baseline clustering method that you are defining as "the gold standard" you should find what it's loss function is and use that. For example, k-means is attempting to minimize the function
$$ sum_{i=1}^k sum_{forall x in text{ Cluster }_i } ||x-mu_i||^2 $$
Where $mu_i$ is the centroid for cluster i.
You can easily compute this for the results of k-means, and then any other algorithm that generates centroids for each cluster. Whichever has the lowest score is doing better at clustering by the defined metric.
Similar Posts:
- Solved – Clustering weighted data with k clusters
- Solved – Anomaly detection based on clustering
- Solved – What are the most common metrics for comparing two clustering algorithms (especially density based clustering)
- Solved – How to segment test data based on clustering run on training data- UNSUPERVISED
- Solved – Repeating k-means, is it helpful