I used two different methods to retrieve a list of 1808 and 867 elements from a list of 3431 elements. The two lists (1808 and 867) have 683 elements in common. This shows that the two approaches produce results that highly overlap but I want to calculate statistics to prove this. I looked into it and found hypergeometric test to be a relevant test. But I am not sure how to perform the test on my data set. I assume hypergeometric test can take four variable as input but I have five. Basically, I want to know if I pick 1808 elements from 3431 elements and 867 elements out of 3431 elements, what is the probability that their will be 683 elements in common.
PS: I also think this test is a without replacement test.
Best Answer
I found a similar question that has been asked before and it was exactly what I needed. I should have searched enough before posting this question. Here is the post I have been referring to: Calculating the probability of gene list overlap between an RNA seq and a ChIP-chip data set
Similar Posts:
- Solved – Significance of overlap between multiple lists
- Solved – When sampling without replacement from a given distribution, what’s the total expected weight of the last k sampled items
- Solved – Expected maximum given population size, mean, and variance
- Solved – Plotting a heatmap based on clustering in R
- Solved – How to calculate if the degree of overlap between two lists is significant