What is A/B Testing?
Lately, I’ve been trying to round out my analytics skillset by refreshing some statistical concepts that are vital to performing Data Science. These topics can range from things like refreshing my understanding of the central limit theorem, the assumptions for linear regression, to what constitutes statistical significance and why a p-value helps avoid Type I errors. Something that is vital to doing analytics and data science professionally, most notably in Medicine and Technology is designing, performing, and interpreting AB Tests.
A/B testing is a statistical way of comparing two or more versions. It’s integral to how a company understands its customers as well as its products. Whether in your email inbox or in your recommended advertisements on social media, chances are you were involved in an AB Test in the past month. In the tech field AB Testing is the go to way of trying out small changes to evaluate if they’ll make their products more usable. For example, would changing the color of the “Create Account” button on a popular website increase the proportion of new accounts to unique users? In advertising, AB testing is used to see what advertisements are ran in what regions while in medicine, AB testing helps evaluate new drugs.
AB testing does a great job in helping data scientists and engineers understand the significance, value, and cost of these changes, but it’s less adept at handling larger changes like a design refresh. Former CEO of Mozilla John Lilly had a great analogy for it: A/B testing is really useful for helping you climb to the peak of your current mountain, but isn’t so useful on deciding which mountain you want to be on. (Although, Netflix has since implemented their own form A/B testing for these types of large changes that they’ve dubbed “Mountain Testing”).
As the mathematics behind an AB test is basic hypothesis testing, the remainder of this blog will detail the questions you should ask during the set-up of an A/B test.
In setting up your experiment ask yourself the following questions:
1. Can A/B Test accurately reflect the effect of the change? Does it seem like a new version? Or is it a totally different product? Most people are likely to react adversely to new layouts to products they’re familiar with. Keeping in mind the limited scope of an A/B test will help weed out extraneous factors as well.
2. What sample size is needed to keep your chances of a type I error low (alpha) and the power of your result (1-beta) high? You obviously would not want to give the alternative to more customers than is necessary, especially if the alternate performs poorly. For my practice, I’ve used this calculator.
3. What is your key evaluation? Is the amount of unique clicks? Or is it the proportion of unique clicks to unique views?
4. State your null and alternative hypothesis. This will let you know what exactly you’re accepting or rejecting when you do your one-tail or two-tail hypothesis test.
That’s it for this week. Stay tuned for more data science topics next week!