# Solved – How to determine statistical validity of results

I have two exclusive groups of people and a counter of how many events happened for each group.

Lets say group 1 has 7000 people and group 2 has 3000 people.

I'm calculating the event percentage for each group for example for group1 its 50/7000. for group 2 its 40/3000.

I want to calculate how much statistical validity these results have (in other words is the groups large enough or I need to collect more data). Probably in percentage (where >95% means its valid statistically)

Can someone point me to how to do it. I need to implement it in PHP code. I have little statistics knowledge. I think it involves square chi function but I'm not sure how to use the data with PHP chi square function http://www.php.net/manual/en/function.stats-cdf-chisquare.php

Thanks!

We're talking about visitors to a web store. I divide them randomly to 30% group B (test group) and 70% group A. I expose gruop A to a certain message.
I compare the conversion rates of the groups (% of visitors who buy something). And I want to know when the samples are large enough to be statistically significant.

Contents

You are calculating the mean of a variable that is 0 if no event and 1 if there is an event. The sum of \$N\$ such (independent) binomial random variables has a variance \$Ntimes p(1-p)\$. The mean has a variance \$p(1-p)/N\$. We can use a two-sample difference in means test to see whether the difference in proportions between the groups is significant. Calculate:

\$\$begin{equation*}frac{p – q}{sqrt{p(1-p)/N + q(1-q)/M}}end{equation*}\$\$

where \$p\$ is the proportion from group 1, which has \$N\$ observations, and \$q\$ is the proportion from group 2 with \$M\$ observations. If this number is large in absolute value (bigger than 1.96 is a typical norm, giving a hypothesis test with a significance level of 5%), then you can reject the claim that the two groups have the same proportion of events.

This assumes that each person in group 1 has the same probability of having an event and each person in group 2 has the same probability of event, but these probabilities can differ across groups. Since you are randomly assigning people to the groups (e.g., they aren't self-selecting into them), this is a reasonably good assumption.

Unfortunately, I can't help with you PHP coding, but I hope that this gets you started.

Rate this post