I have a dataset where each row represents a sample and each sample is described by its chemical composition. You can see the 10 first rows of the dataset in figure 1.
Figure 1 – Each row represent a sample and each sample is decomposed into the 17 different chemical compounds and the total (all values are given in percentage)
First I found the correlation between the samples and made the correlation matrix shown in figure 2.
But what I really want to cluster the chemical compounds that are more likely to be found together in a sample.
Best Answer
You seem to look for cluster analysis. Cluster analysis groups data according to some distance measure and correlation may well be the basis for your distance measure(*). As you have not mentioned any rules of how well samples should correlate to be toghether in one group, hiearchical cluster analysis might be in order: It will reveal visually the structure of how many groups do form depending on how you set a cutoff.
(*) https://www.datanovia.com/en/lessons/clustering-distance-measures/ writes
Correlation-based distance considers two objects to be similar if their features are highly correlated, even though the observed values may be far apart in terms of Euclidean distance. The distance between two objects is 0 when they are perfectly correlated. Pearson’s correlation is quite sensitive to outliers. […]
If we want to identify clusters of observations with the same overall profiles regardless of their magnitudes, then we should go with correlation-based distance as a dissimilarity measure
Similar Posts:
- Solved – Am I interpreting the mixed effects Cox regression models properly? (two-level nested survival data)
- Solved – Unbiased estimation of covariance matrix for multiply censored data
- Solved – Mahalanobis distance in a hierarchical cluster analysis in SPSS
- Solved – Mahalanobis distance in a hierarchical cluster analysis in SPSS
- Solved – Mahalanobis distance in a hierarchical cluster analysis in SPSS