I have a data set giving the number of visits for ~20 web pages for a total of ~3000 users. To indetify "similar" users according to the number of visits of each web page, I ran a k-means clustering.
I now know which user belongs to which of the k = 3 (k is irrelevant here) clusters. But how can I characterize the clusters? Is there a way to come to a conclusion similar to "User X belongs to the cluster of users, that like web pages about News and Politics."?
You used a single metric to classify the users into clusters? I'll assume you have additional, descriptive information about these events. One heuristic would be to run a summary of cluster central tendencies (e.g., means, medians, etc.) based on the cluster assignments across the descriptive information. So, if you have k=3 and x=20 (both k and x are irrelevant, x being the number of descriptors or features), then the output would create a 20 (rows) by 3 (columns) summary matrix for analysis. Next, to determine how the clusters differ on each descriptor, create an index based on the cluster value divided by the global value across all users for each descriptor. This index would be like an IQ score where 100 is "normal," 120+ and 80 or less indicating descriptors that are suggesting behaviors that diverge from the norm. 120+ and 80 or less are like "quick and dirty" significance tests for between group (clusters) differences.
- Solved – Problem in a cluster analysis of User behavior
- Solved – K-means clustering for usage profiling
- Solved – Determining similar users from hierarchical clustering
- Solved – Compare skewness of many distributions with few observations
- Solved – Use hierarchical clustering in R to cluster items into fixed size clusters