I have observed that when I significantly reduce the dimensionality of my data that the silhouette score drastically increases. I have reduced the dimensionality so that only 10% of the variance is retained.
With no dimensionality reduction, I get on average silhouette scores ~0. With dimensionality reduction, only keeping 10% variance, I get a score of ~.78.
Based on the silhouette score, is the data actually better clustered in this low dimensionality, or have I manipulated the data too much for this score to be reliable?
Best Answer
Never compare silhouette scores of different preprocessing, in particular not of different features.
This is comparing apples and oranges.
If you want to see if the clusters after PCA are better, use the cluster labels with the original data for Silhouette.
Similar Posts:
- Solved – Which one should be applied first: data sampling or dimensionality reduction
- Solved – the advantage of reducing dimensionality of predictors for the purposes of regression
- Solved – the advantage of reducing dimensionality of predictors for the purposes of regression
- Solved – the difference between manifold learning and non-linear dimensionality reduction
- Solved – LDA dimensionality reduction