I have a set of $d$-dimensional vectors ${v_1,v_2,dots,v_n}$, each of which has been assigned a label from a set $S={s_1,s_2,dots,s_k}$. I would like to find another set of labels $T={t_1,t_2,dots,t_l}$ where $l < k$, such that all vectors having the same $S$ label also have the same $T$ label. In other words $T$ is a strictly coarser clustering than $S$. My question is, what is a good way to go about finding this $T$ clustering?

The obvious approach would be to take the mean of all of the vectors having a given S label, and then cluster these new $s$ vectors. However I feel like this throws away a lot of potentially useful information about the distribution of the vectors that went into computing those means. Is there another method for finding this $T$ clustering which makes better use of the $v$ vectors? Thanks in advance.

**Contents**hide

#### Best Answer

Hierarchical agglomerative clustering might work for you. It typically starts with each data point in its own cluster, then iteratively merges pairs of clusters to form larger and larger clusters. Since you already have an initial clustering, you'd start from that instead of individual points. To determine your merging procedure, you'd need to decide on the distance metric and linkage criterion. The linkage criterion determines which pair of clusters to merge next. Many different criteria are discussed here.