How to calculate statistical significance

You’ve set up your first A/B test, and the results are pouring in; now you’re wondering “how the heck do I know which one is better”? Enter statistical significance.

So which test performed better?

Well, first thing’s first; start by calculating the conversion rate for each of the tests. This can be done by simply dividing the number of conversions by the population (the number of people that have seen the test).

Conversion Rate = Conversions / Population

At this point, you could compare the conversion rates for each of your tests, declare a winner and call it a day. So, why wouldn’t you? To answer a question with a question, how can you be sure your winning test wasn’t due to luck? Enter statistical significance.

Statistical significance is the probability that your split test’s result was due to random chance. Another way of thinking about this is that the smaller the probability that the result was random, the more confident you can be that your changes are what caused the result.

What you need to determine statistical significance:

Probability value

Before you calculate the statistical significance of your tests, it is best practice to first establish a Probability Value (p-value). A p-value is the probability below which you reject the chance that your results are due to luck and declare your test statistically significant.

As with most things, there is a tradeoff here; the lower the p-value is, the more certain you are, but tests will take longer to reach statistical significance. Due to this tradeoff, different industries and applications use different p-values. For example, in medical and pharmaceutical applications, where a high degree of confidence is desirable, you might use a p-value of 0.0001 (99.99% confidence). For general research, you might set your p-value to be 0.01 - 0.05, and for marketing you might set your p-value between 0.01 and 0.2. As long as you understand that a higher p-value means lower confidence levels, the choice is yours.

Null hypothesis

As you might have noticed already, statistics is rather odd in that it makes you think in double negatives; you want a low probability that your test results were due to random chance. So why can’t you just say that you want a high probability that the changes you’re testing (rather than chance) made a difference?

In hypothesis testing, you accept your alternative hypothesis Ha by rejecting the null hypothesis H0. In the case of conversion rates, your null hypothesis will very often be “my changes will have no impact on the outcome of the test”.

Normal distribution

Once you’ve determined what your null hypothesis is, the next step is to calculate the inputs for the normal distribution of the null hypothesis. To calculate the normal distribution you need to know the mean and variance (from which the standard deviation is derived). Now, because your null hypothesis was there would be no difference between the two tests, the mean is 0. This looks like:

Conversion Rate A = Conversion Rate B

Conversion Rate A - Conversion Rate B = 0

The variance can be found by combining the results for both tests (the null hypothesis says they’re equal) and applying the Central Limits Theorem yielding the following equations:

Conversion Rate Total = (Conversions A + Conversions B) / (Population A + Population B)

Variance = (Conversion Rate Total (1 - Conversion Rate Total) * (Population A + Population B)) / (Population A x Population B)

Standard Deviation = √Variance

Standard score

Almost there! The standard score, more commonly known as the z score is a measure of how many standard deviations away from the mean a value is.

Standard score = (value - mean) / standard deviation

The z score is used in conjunction with a normal distribution cumulative probability lookup to determine the probability of a value occurring. Given that the mean is zero (due to the null hypothesis) and value is ± (Conversion Rate A - Conversion Rate B) this becomes

Standard score = ± (Conversion Rate A - Conversion Rate B) / standard deviation

Statistical significance

Now that you have calculated the standard score, the final step is to look up the cumulative probabilities, subtract the smaller from the larger and subtract that from one.