The Bayesian inference method has been used for quite a long time by affiliates to decide which offer or lander is performing best. Since few of us are math geniuses or have a Phd in statistics, we have believed that this method is the golden standard when it comes to split testing.
Certainly the Bayesian inference is without a doubt one of the best known methods we can use to calculate the probability of A being better than B – but this is true only when we analyze static data.
As we all know, the data we deal with is very dynamic. A campaign’s performance will change over time and along with that our optimization decisions should change as well. Unfortunately, the Bayesian method doesn’t take this into account and hence this method is not suitable any more.
If we take a closer look to the Bayesian inference method, we can see that “the moment” doesn’t exist. So basically this method totally ignores when a specific metric is measured. As consequence, our data loses an important piece of information, therefore the suggestions we make using this method aren’t reliable any more. Indeed, these results may lead us in the wrong direction!
An alternative better approach to calculate the probability of A being better than B, would be using the Poisson process. In this approach we can calculate the total number of successes (conversions) happening in a specific time interval (where time can be adopted to clicks or budget).
To give you a better idea what I am talking about, let’s go through a simple example where we put the Bayesian inference method to the test.
The Test
In this example we will analyze the 7 days’ performance of a campaign that contains two paths that perform differently (stats below).

Let’s find out which is the best path using the Bayesian inference A/B test calculator:

The Bayesian inference equation suggests to focus our campaign’s traffic on Path 1 with a probability of 80.68% being the best path, compared to Path 2 with a probability of only 19.32%. Simple, right?
Now let’s see the performance of these two paths on a day-to-day basis.

If numbers in the above table are not enough, let’s see the daily performance trends in a graph.

As we can clearly see from the graphs, over time Path 1 performance is decreasing while Path 2 performance is increasing.
According to the Bayesian’s suggestion, that Path 1 has a higher probability of being best compared to Path 2, is totally wrong!
Let’s dive into more details on why this suggestion is wrong!
In a real world scenario, the first offer is dying due to regulations etc and the second one picking up after the advertiser has tweaked their flow.
To which one would you send traffic?
The initial success of Path 1 suggests to drive all traffic on it, thus losing the opportunity to drive traffic to Path 2. But after the 2nd day, Path 2 is showing higher potential. As we already mentioned earlier in this article, the Bayesian inference will not take in consideration when a specific metric was measured. Therefore, it is important to understand that the Bayesian inference method will never be able to understand when Path 2 will become better than Path 1.
In the best case scenario, the Bayesian inference could have suggested the correct answer if the overall performance of Path 2 could have surpassed the overall performance of Path 1. But that would require quite a lot more time and ad spent to reach this desired hypothetical result.
If we were going to trust the suggestions given using the Bayesian method for the selected stats interval, we would have focused our traffic on Path 1 and disable Path 2 which would have never had the chance to show its potential.
It is clear that relying on the suggestion of a wrong method leads to a total campaign failure.
Solution
During the last 12 months we have been working on a proprietary algorithm which analyzes in detail each and every metric of a campaign. Based on every click in time, it spots trends and “learns” to squeeze the maximum ROI out of your campaigns. This means automatically blocking that bad publisher that is bleeding money when your are sleeping or stopping traffic to an offer / lander that suddenly stopped converting.
It requires very little amounts of traffic to start doing it’s job and on every click it continuously improves its decisions. And the best part of all is that it is tracker and traffic source agnostic. So you can start using it regardless if you run pops tracking with
If you are curious and want more on how it works just post here and we will get our math genius that built the algorithm to chime in
.
Interesting share, good points.
Anyway, wouldn't be the solution then just to take more fresh data into consideration? ie. not take the whole life time of both paths, but just take last x days (x hours), so the data is more close to the actual outcomes?
You make an interesting point, but the point you make is actually not the point that you think you are making.
In a basic A/B split test, the whole point is to test one dependent variable and one dependent variable only, and while keeping all other dependent variables constant. In the trade, this is called the ceteris paribus principle.
Here you are actually introducing a second variable which is temporal.
If you want to measure this temporal "velocity", then the output that you need to be measuring as your y dependent variable is conversion growth, and not raw conversions, and you need to make sure you are measuring geometric growth (multiplicative) and not simply arithmetic (additive).
Affiliates are using naive Bayes cause no one is testing for independence of the variables. But actually there is a strong correlation of the variables, altering the results. We need to use numerical analysis combined with the statistical one (in this case binomial, or Poisson process). And with Poisson you don't need to analyse the "conversion growth" cause the changes are already predicted in the analysis...
Which ML model do you use for learning ?
Why do you need a proprietary algorithm ? You would achieve the same with using sliding window for data ?
I don't think Bayesian failed in case outlined above, the thing that failed is your time window for analysis.
If data was analysed on a daily basis with a reserve % (eg 10%) traffic share to nonperformer.. you would get much better paths spliting.