I have an exam on the k-means algorithm and clustering and I was wondering if anyone knows how to figure out this sample exam question. My teachers are hopeless to provide any information on how to solve this question. Thank you
Best Answer
In your teachers' defense, the question sounds fairly self-explanatory. If you can pinpoint which part is causing you the trouble, I can be more specific. In any case, to give graphical intuition, have a look at this figure (from Bishop's book) below:
(a) The blue and red crosses in subplot (a) are the seeds (s1 and s2) that your teacher gives.
(b) This corresponds to your subquestions i and ii, you first calculate the distances of all green points to the red and blue crosses (your initial guesses s1 and s2) and then paint each point as blue or red depending on whether they are closer to s1 or s2. As an additional point, here your teacher also asks you to give the points that are the closest to the cluster centres.
(c) This is subquestion iii, based on how you painted the points in the previous step, you (re)calculate your cluster centres (s1 and s2) by taking the average of all blue and red points separately.
(d to i) When you repeat the steps i to iii (given in your question) a sufficient number of times, you end up with better cluster centres that partition your green points into distinct red and blue groups.