Solved – Does it make sense to run DBSCAN on the output from t-SNE?

Performed time series clustering where I used DTW to generate a distance matrix. The distance matrix was then given as an input to t-SNE where the two-dimensional results from t-SNE were used for clustering with DBSCAN.

Does it make sense?

T-SNE is a manifold technique and as such does not preserve distances; therefore it is not recommended to run distance-based (e.g. k-means) or density-based (e.g. DBSCAN) clustering algorithms on the output of T-SNE. This has been asked before.

If you want a dimensional reduction algorithm that does preserve distances, you can use PCA instead of T-SNE. PCA gives you an orthogonal rotation of your original data; one of the properties on an orthogonal transformation is that it preserves distances. When you use PCA for dimensional reduction by projecting into a lower dimensional space by throwing out factors with small eigenvectors, you lose only a small amount of information about distance.

Similar Posts:

Rate this post

Leave a Comment