Given an hierarchical clustering of data points, some of which are labeled, are there good ways to use the tree/dendrogram to make predictions for the unlabeled points?

One approach might be to find the "best" place to cut the tree so that clusters match labels.

I'd be especially interested in efficient ways to cut each cluster at a different height. But I'd also be interested in approaches that don't use "hard" cuts to make the predictions.

**Contents**hide

#### Best Answer

I'm answering about another approach that doesn't use hard cuts on the dendrogram. I would suggest you to use something like linear discriminant analysis (LDA) or any other technique that allows you to predict the class of the unlabeled points. (There are many techniques that can do the job, but I find LDA the easiest)

LAD is used when you have a set of multivariate observations, and those observations belong to a particular class. This dataset is composed by the labeled points. On the other hand you may have a set of points in the same variables but the class is unknown. These are the unlabeled points.

The goal of LDA is to classify the unknown points in the given classes. It is important to notice that in your case, the classes are defined by the hierarchical clustering you've already performed.

Discriminant analysis tries to define linear boundaries between the classes, creating some sort of "territories" (or regions) for each class. For any unlabeled point, you must check to which region it belongs.

You can check this lecture on LDA. It is simple and sufficiently explained.

If you need software to accomplish this task, you might check R documentation on the functions `lda()`

and `predict.lda()`

in `MASS`

package. Check Quick-R for additional help. SPSS has a very easy implementation of LDA.

In fact, I think it is very odd to "force" the dendrogram to include the unlabeled points. Normally, the dendrogram is built using all of the unlabeled points in a dataset.

Hope this is useful 🙂

### Similar Posts:

- Solved – “Updating” hierarchical clustering
- Solved – Divide-and-conquer approach for hierarchical clustering
- Solved – Use clustering to create labels of unlabeled data and then classify a test set (available or not in the clustering)
- Solved – Use clustering to create labels of unlabeled data and then classify a test set (available or not in the clustering)
- Solved – Purpose of dendrogram and hierarchical clustering