Getting Started with Content Testing and Bandito

Content testing through Bandito allows you to maximize audience engagement by testing different variants of content in the same feature.

For example, you may start with a hypothesis that a headline about a particular sports team losing an event (Team B Narrowly Defeated!) will trigger more clicks than a headline about the other team winning (Team A Wins Big Game!). To know which headline performs better on your audience, you have to run an experiment where both variants are presented to your audience and calculate which performs better. Once you have enough data to decide which variant is performing better, you can declare a winner and ensure your entire audience is seeing the winning content.

Bandito implements a type of content test called a multi-armed bandit.

Experiment Phases: Exploring and Exploiting

Content testing experiments can be thought of as having two phases: exploration and exploitation. In the exploration phase of an experiment, your goal is to learn about audience behavior. In the exploitation phase, your aim is to use what you’ve learned in the exploration phase to make smarter decisions about content. You are aiming to exploit your results.

Different types of content tests have different balance of exploitation and exploration. Bandito is able to learn audience behavior over time rather than just at the end of the experiment, allowing it to spend more time exploiting your audience behavior.

How does Bandito learn which content to show?

Bandito tracks two key pieces of data – the number of times each variant is served (serves) and the number of times a variant is clicked (clicks). These two pieces of data form the basis of your experiment. Bandito employs a method called a multi-armed bandit (MAB) test. MABs observe audience behavior on each variant throughout the experiment, dynamically routing more traffic to the better performing variant as time goes on. Eventually, Bandito declares a winner of the experiment by tracking how long a variant has been performing significantly better than the others.

What is an A/B Test?

In order to explain Bandito’s algorithm, we’ll first talk about another common form of content testing – the A/B Test.

An A/B Test begins with a set of known variants. For example, perhaps you have two headlines, “Team A Wins Big Game!” and “Team B Narrowly Defeated!” and you’d like to know which performs better on your audience. In an A/B Test, 50% of your audience would see each of these variants for as long as the experiment is active, which is defined by the experimenter. Once enough serves and clicks have been collected, the CTR of each variant can be calculated and a winner can be declared. At that point, knowing the winner, traffic can be routed 100% to the winning variant.

In this A/B Test, the entire experiment is spent exploring—that is, learning about audience behavior on each variant. Only after the experiment ends are you able to start exploiting what you have learned by pointing all traffic at the winning variant.

How is Bandito Different?

Bandito, a Multi-Armed Bandit implementation, instead tries to optimize for exploitation.

Unlike an A/B Test, Bandito is able to route traffic to high-performing variants and declare a winner without manual intervention. Left entirely alone, Bandito will eventually route nearly all traffic to the winning variant.

To determine which variant to show at any given moment, Bandito uses the combination of two numbers: an exploitation score and an exploration score. The exploitation score in Bandito is the variant’s current CTR. CTR is used as a measure of how well the variant is currently performing. The better performing the variant, the more we want to exploit it by showing it to our audience. The exploration score is a measure of the variant’s variance. A variant’s variance tells us how confident the experiment is that it knows enough about this variant. The higher this number, the less confident the experiment is. This means the experiment is more likely to want to explore this variant more.

Bandito selects the variant that has the highest total score after adding these two scores together. In the beginning of an experiment, variance will be high across all variants so the exploration score will usually decide the winner. Over time, as the experiment learns more about every variant, variance scores drop and the experiment begins to serve based entirely on how well each variant is performing.

Adding and Removing Variants

In an A/B Test, all variants are known at the beginning of the experiment. With Bandito, new variants can be introduced at any point. For example, hours into an experiment, you may learn new information that makes you think a third headline, “Team A Cheated!”, will actually perform the best.

Because traffic to each variant is determined through a combination of exploring and exploiting, variants added midway through an experiment will have a high exploration score and will begin to receive traffic immediately. Eventually, given enough clicks, the experiment will swing back to exploiting the best-performing variant.

Declaring a Winner

While Bandito will eventually begin routing nearly all traffic to the best-performing variant, the test will still declare a winner once it can. Once a winner is declared, an Editor in PageBuilder Editor will be able to select this content as the new default.

Bandito chooses a winner when a single variant is performing significantly better than all other variants for longer than 5 minutes. Every 20 seconds, Bandito calculates the significance of the best-performing variant. If the variant is significantly better performing, Bandito announce that convergence has begun. This means that the test is beginning to converge on a single winner. Once the same variant has been in convergence for over 5 minutes, it is declared the winner.

A few things can prevent a converging variant from becoming the winner:

A new variant could be introduced. If any variant in the test has fewer than 50 clicks, the test will wait for all variants to “catch up” before declaring a winner.
A different variant suddenly has a higher click rate. If during the 5 minute convergence period another variant has a sudden spike in clicks, it may make this variant start to outperform the converging variant. If at any point during convergence another variant performs better than the converging variant, convergence will be canceled and have to start over.

In this section: