I am trying to implement the K-mean analysis with the Standard algorithm.
My implementation seems to work, but I noticed some strange behavior. If the k is close to half of the length of the list to be analyzed, I will get a set that is empty. I am not sure if it is the correct behavior.
I think the worst case is k equal to the length of the list, and each result sets has only 1 element. Empty result sets will happen if k is greater than the length of the list, but it is an invalid situation.
Best Answer
The behavior you describe is perfectly correct. Using such large sizes of $K$ w.r.t. to your list length is also one of the reasons why you get empty clusters. Be wise when choosing $K$ and your initial set of centroids (which I assume you sampled from your population).
Remember also that even though K-means is an optimization problem it does not define a convex function.
Last but not least, execute your K-means runs several times with the same K and compare results so you'll get an idea about the stability of your problem.
Similar Posts:
- Solved – How to perform a two-dimensional grid-search for the MLE in R
- Solved – Using KMeans++ Computing Weighted Probability for KMeans Initialization
- Solved – Similarity between sets with different size
- Solved – Problem on Kolmogorov -Smirnov test
- Solved – Fitting a glm to a zero inflated positive continuous response