I have two exclusive groups of people and a counter of how many events happened for each group.
Lets say group 1 has 7000 people and group 2 has 3000 people.
group 1 had 50 events and group 2 had 40 events.
I'm calculating the event percentage for each group for example for group1 its 50/7000. for group 2 its 40/3000.
I want to calculate how much statistical validity these results have (in other words is the groups large enough or I need to collect more data). Probably in percentage (where >95% means its valid statistically)
Can someone point me to how to do it. I need to implement it in PHP code. I have little statistics knowledge. I think it involves square chi function but I'm not sure how to use the data with PHP chi square function http://www.php.net/manual/en/function.stats-cdf-chisquare.php
Thanks!
ADDITIONAL INFO:
We're talking about visitors to a web store. I divide them randomly to 30% group B (test group) and 70% group A. I expose gruop A to a certain message.
I compare the conversion rates of the groups (% of visitors who buy something). And I want to know when the samples are large enough to be statistically significant.
Best Answer
You are calculating the mean of a variable that is 0 if no event and 1 if there is an event. The sum of $N$ such (independent) binomial random variables has a variance $Ntimes p(1-p)$. The mean has a variance $p(1-p)/N$. We can use a two-sample difference in means test to see whether the difference in proportions between the groups is significant. Calculate:
$$begin{equation*}frac{p – q}{sqrt{p(1-p)/N + q(1-q)/M}}end{equation*}$$
where $p$ is the proportion from group 1, which has $N$ observations, and $q$ is the proportion from group 2 with $M$ observations. If this number is large in absolute value (bigger than 1.96 is a typical norm, giving a hypothesis test with a significance level of 5%), then you can reject the claim that the two groups have the same proportion of events.
This assumes that each person in group 1 has the same probability of having an event and each person in group 2 has the same probability of event, but these probabilities can differ across groups. Since you are randomly assigning people to the groups (e.g., they aren't self-selecting into them), this is a reasonably good assumption.
Unfortunately, I can't help with you PHP coding, but I hope that this gets you started.