I might be overthinking this. I generated the output in R and 5 of my 10 samples were successful, so that's 50%. Given that, if I am to estimate the probability of two or more people in a group of 30 sharing a birthday, what is my total sample? Should I be using combinations?
Contents
hide
Best Answer
How are you generating your birthdays? To generate 23 birthdays:
dates = sample(1:365, 23, replace = TRUE)
To see if 2 or more share the same birthday:
length(dates) != length(unique(dates)) # TRUE if there are duplicates
How often is the above TRUE?
dupe_count = 0 runs = 1000000 for (i in 1:runs) { dates = sample(1:365, 23, replace = TRUE) if (length(dates) != length(unique(dates))) { dupe_count = dupe_count + 1 } } print(dupe_count / runs) [1] 0.508158
This closely matches the theoretical value of 50.7% in the wikipedia page