Solved – Is it OK to use correlated variables for cluster analysis

I know there is a series of regression diagnostics procedures (correlation, beta, residual, etc.) before, during, and after regression analysis. But, is there any common procedure to follow for cluster analysis (like, Ward)? What are the R commands? Thanks!

Correlation can cause problems with many clustering algorithms by giving extra weight on these attributes. For k-means it seems to be a best practise to whiten the data first, for example.

However, there exist correlation clustering algorithms that are meant to process data containing multiple correlations, and cluster objects based on the correlations they exhibit.

Similar Posts:

Rate this post

Leave a Comment