The problem I am trying to solve is finding the probability that two people are the same by cross-referencing the associates of the two people. For example, if person A is associated with the following people:

- Jeff
- Rick
- Jessica
- Mary

Person B is associated with the following people:

- Ryan
- Mary
- Dennis
- Scott
- Jeff
- Sharon
- Rick
- Larry
- James

So these two people have the following people in common:

- Mary
- Jeff
- Rick

How would I go about figuring out the likelihood that Person A and Person B are the same based on the common relationship with the three people above? There are three factors I can see right now, but I don't know how to weigh any of them:

- Ratio of common associates (doubled because seen from both sides) over the total number of associates
- Ratio of common associates over the number of associates for Person A
- Ratio of common associates over the number of associates for Person B

I'm not a statistician, so I don't know if what I've presented is the correct way to solve the problem. Can anyone provide some guidance?

**Contents**hide

#### Best Answer

I think using the Jaccard Distance would be suitable for this problem. The MinHash algorithm finds the Jaccard similarity coefficient.

### Similar Posts:

- Solved – How to calculate centrality measures in a 4 million edge network using R
- Solved – Are discrete single value prior distributions always lost in MAP estimation
- Solved – Probability that any two people have the same birthday
- Solved – Probability that any two people have the same birthday
- Solved – Can odds ratios increase if you include more variables