I am trying to implement the K-mean analysis with the Standard algorithm.

My implementation seems to work, but I noticed some strange behavior. If the *k* is close to half of the length of the list to be analyzed, I will get a set that is empty. I am not sure if it is the correct behavior.

I think the worst case is *k* equal to the length of the list, and each result sets has only 1 element. Empty result sets will happen if *k* is greater than the length of the list, but it is an invalid situation.

**Contents**hide

#### Best Answer

The behavior you describe is perfectly correct. Using such large sizes of $K$ w.r.t. to your list length is also one of the reasons why you get empty clusters. Be wise when choosing $K$ and your initial set of centroids (which I assume you sampled from your population).

Remember also that even though K-means is an optimization problem it does not define a convex function.

Last but not least, execute your K-means runs several times with the same K and compare results so you'll get an idea about the stability of your problem.

### Similar Posts:

- Solved – How to perform a two-dimensional grid-search for the MLE in R
- Solved – Using KMeans++ Computing Weighted Probability for KMeans Initialization
- Solved – Similarity between sets with different size
- Solved – Problem on Kolmogorov -Smirnov test
- Solved – Fitting a glm to a zero inflated positive continuous response