Solved – Statistical significance when A/B test has multiple values

I'm sure this is a pretty standard statistics question, but I'm no expert… I'm running an A/B test on my website to see if a change results in users adding more content. So there are 2 basic things I'm looking at; the # of users adding at least 1 piece of content and the total # of pieces of content added by all users.

I really care much more about the total # of pieces of content added by all users. I'll make the change permanent if I know it's at least not worse than the existing site. So I need to know how many samples (users logging in) I need to have a 95% confidence level. Normally I can use one of the many web A/B test calculators that use chi-square and similar tests to figure out if my test is statistically significant or to figure out the sample size I need. In the first case of seeing how many users added content, I can do this. But to see the total pieces of content added among all users, I can't use those tests as their isn't a "conversion" event.

So what's the best way to see what sample size I need to be statistically significant to a 95% confidence level? And how can I see if my test shows whether there was a difference? Again I just want to make sure the new change isn't worse (or isn't "much" worse, do I need to and how should I define "much"?).

The perspective Ralu is using is basically p is the probability of A and for the binomial he's saying you have the events A and not A which for you is B and that's your event space. Since you don't know your actual value for P(A) and assuming you don't have a good guess for it you'll want to use a conservative estimate of .5 plugging that into the equation in the other answer is going to imply you need 16 observations in your sample. However I'm not sure binomial is the best choice in this case.

When determining sample size there are two things you're going to want to decide. First is your confidence level (which you have as 95%) the next is you're going to want to decide what margin of error is acceptable for your analysis.

It might be worth considering example 8.10 in Wackerly et al. Since it actually looks at determining sample size for two sample groups which is your situation.

The explanation in the example seems thorough enough (if you have questions though please ask), but in case you don't click to take a look it will result ( 2n = 1/[$(1/1.96){}^{2}$/8] = 31 so n = 62 ) in requiring 31 in each group so 62 in total. Notice that this is nearly four times the size of what the other method suggested it is also larger than the 40 samples usually cited for the Central Limit Theorem which gives it good properties.

However that has a fairly large margin of error (1 = 100% margin of error). Let's say instead you wanted a very small margin of error such as 4%

2n = 1/[$(.04/1.96){}^{2}$/8] = 19208 so n = 38416

Remember though in this example we assumed the range was 8 and used 4*Sigma is approximately equal to the range. Which may or may not make sense for your problem. Range is Max – Min. So if in your current data you see a very different range you may want to use that value instead to recalculate accordingly.

As for determining whether there was a difference you're going to want to use a hypothesis test. In particular you're going to want to use a two sample T-Test. Your null hypothesis is that the means are equal. In your case you want to know if the new one is greater than the old one so you'll want a one tail (also called a directional hypothesis) alternative hypothesis. Once you've calculated the T statistic using the formulas in that link you'll need to find the corresponding critical value from a table. For your confidence level you'll want to know if the test statistic is greater than this value: 1.644854 if it is than the new mean which we're going to consider as mu1 is greater than the old one mu2. If it is not greater than that value you fail to reject because your evidence isn't strong enough.

Hopefully this helps!

Similar Posts:

Rate this post

Leave a Comment