Solved – Matrix clustering based on a Jaccard distance cutoff

I'm trying to figure out how to group elements of a binary matrix based on a given Jaccard distance cutoff. For example, suppose that I have information on the types of food carried by various grocery stores, and I want to group the stores in such a way that each store in the group is at least 30% similar to at least one other store in the same group.

The input looks like this:

> groceries        food1 food2 food3 food4 food5 food6 food7 store1     1     1     0     0     0     0     0 store2     1     0     1     1     0     0     0 store3     1     0     0     1     0     0     0 store4     1     1     0     0     0     0     0 store5     0     0     0     0     1     1     1 store6     0     0     0     0     1     0     1 

And I'm able to calculate the Jaccard distances like so:

> d=dist((groceries), method="binary") > d           store1    store2    store3    store4    store5 store2 0.7500000                                         store3 0.6666667 0.3333333                               store4 0.0000000 0.7500000 0.6666667                     store5 1.0000000 1.0000000 1.0000000 1.0000000           store6 1.0000000 1.0000000 1.0000000 1.0000000 0.3333333 

I can eyeball the distance object and see that some of the stores meet my 30% similarity cutoff (Jaccard distance <= 0.7):

store1 & store3 store1 & store4 store2 & store3 store3 & store4 store5 & store6 

So I'd like there to be two groups: one made up of stores 1, 2, 3, and 4, and another made up of stores 5 and 6. Eyeballing the distance matrix obviously doesn't scale, though – is there a way to do this automatically?

[I'm particularly interested in being able to do it in R, so if there's an R function that can do it automatically, that would be most useful.]

This is the components you get with single linkage clustering at cutoff 0.7, which is the exact same thing as constructing a graph with edge weight cutoff 0.7 and computing the connected components of that graph.

Similar Posts:

Rate this post

Leave a Comment