When you run a website, you need to constantly improve and update it to increase its value to both your business and your customers. You can take the methodical approach to this by running controlled experiments.
There are two methods that allow you to run controlled experiments: (1) traditional A/B testing, and (2) multi-armed bandit testing (MAB). Let’s look at traditional A/B testing first.
Traditional A/B Testing
The traditional A/B testing method pits two or more versions of a page against each other. To do this properly, you need to know the current conversion rate of your control, and you need to decide how much of a change you want to detect in your A/B test.
For example, let's say you want to know if Version A converts at least 10% better than your original (also known as the control). You will be given a specific sample size, which is the number of visitors who must see your experiment.
You must run the experiment until you reach the sample size. You cannot make any statistically valid decisions until then. Afterward, you will only know if it was 10% better or not. You cannot accurately tell if it was 5% or 20% better. To do that, you would need to calculate a new sample size and run the experiment again.
To detect smaller changes (such as 5%), your sample size is larger. While detecting a larger change (such as 20%) will give you a smaller sample size and a quicker experiment. If your version was say 19% better, you wouldn’t accurately know that (you would have to re-run the experiment with a sample size for 19%). If you have more variants than just one, the sample size is larger.
While some people stop experiments as soon as statistical significance is reached, this is wrong. People who do this risk choosing the wrong winner and being stuck with a variant that underperforms other options, thus losing sales.
Now, let's look at multi-armed bandit testing.
Multi-Armed Bandit Testing
The best way to understand multi-armed bandit testing is through an example. Similar to traditional A/B testing, multi-armed bandit tests also pit at least two versions of a page against each other. At first, your traffic will be split evenly between Version A and Version B.
As the experiment runs, the bandit algorithm will continuously analyze how each variant is performing. Using this information will change the fraction of traffic that each variation gets. High-performing variants will get more traffic. The underperforming variants will get less traffic. As the experiment continues to run, the traffic distribution will continue to change.
Advantages of Multi-Armed Bandit Testing
Multi-armed bandit tests have four advantages over traditional A/B tests:
- You don't have to be a data scientist to use them. You do not need to know your current conversion rate or pick a sample size before running your experiment.
- They maximize returns. In a traditional A/B test, if you have one variant that is underperforming (and losing you sales), you have to keep running it until the experiment is concluded (or you invalidate the data). Ending it early might mean you choose the wrong winner. However, because MAB constantly adjusts, you minimize any losses from underperforming variants (they get less and less traffic) without the risk of turning them off completely (the variant that underperforms in the beginning might be the best performing in the long run).
- They are more flexible. Multi-armed bandit tests are more flexible because you can add new variants and stop underperforming variants while the test is running without invalidating the experiment.
- They are adaptive. Some variations perform better, not because they are necessarily better, but because they are new. This can lead to invalid experiments. MABs continue to run forever and can adapt to negate this effect or if customers' tastes change over time.