Home > General > Affiliate Marketing Forum

STM Myth Busters - Bayesian Inference Method And When NOT to Use It (8)


04-10-2018 09:04 PM #1 platinum (Veteran Member)
STM Myth Busters - Bayesian Inference Method And When NOT to Use It

The Bayesian inference method has been used for quite a long time by affiliates to decide which offer or lander is performing best. Since few of us are math geniuses or have a Phd in statistics, we have believed that this method is the golden standard when it comes to split testing.

Certainly the Bayesian inference is without a doubt one of the best known methods we can use to calculate the probability of A being better than B – but this is true only when we analyze static data.

As we all know, the data we deal with is very dynamic. A campaign’s performance will change over time and along with that our optimization decisions should change as well. Unfortunately, the Bayesian method doesn’t take this into account and hence this method is not suitable any more.

If we take a closer look to the Bayesian inference method, we can see that “the moment” doesn’t exist. So basically this method totally ignores when a specific metric is measured. As consequence, our data loses an important piece of information, therefore the suggestions we make using this method aren’t reliable any more. Indeed, these results may lead us in the wrong direction!

An alternative better approach to calculate the probability of A being better than B, would be using the Poisson process. In this approach we can calculate the total number of successes (conversions) happening in a specific time interval (where time can be adopted to clicks or budget).

To give you a better idea what I am talking about, let’s go through a simple example where we put the Bayesian inference method to the test.

The Test

In this example we will analyze the 7 days’ performance of a campaign that contains two paths that perform differently (stats below).


Let’s find out which is the best path using the Bayesian inference A/B test calculator:



The Bayesian inference equation suggests to focus our campaign’s traffic on Path 1 with a probability of 80.68% being the best path, compared to Path 2 with a probability of only 19.32%. Simple, right?
Now let’s see the performance of these two paths on a day-to-day basis.



If numbers in the above table are not enough, let’s see the daily performance trends in a graph.



As we can clearly see from the graphs, over time Path 1 performance is decreasing while Path 2 performance is increasing.
According to the Bayesian’s suggestion, that Path 1 has a higher probability of being best compared to Path 2, is totally wrong!

Let’s dive into more details on why this suggestion is wrong!

In a real world scenario, the first offer is dying due to regulations etc and the second one picking up after the advertiser has tweaked their flow.
To which one would you send traffic?
The initial success of Path 1 suggests to drive all traffic on it, thus losing the opportunity to drive traffic to Path 2. But after the 2nd day, Path 2 is showing higher potential. As we already mentioned earlier in this article, the Bayesian inference will not take in consideration when a specific metric was measured. Therefore, it is important to understand that the Bayesian inference method will never be able to understand when Path 2 will become better than Path 1.
In the best case scenario, the Bayesian inference could have suggested the correct answer if the overall performance of Path 2 could have surpassed the overall performance of Path 1. But that would require quite a lot more time and ad spent to reach this desired hypothetical result.
If we were going to trust the suggestions given using the Bayesian method for the selected stats interval, we would have focused our traffic on Path 1 and disable Path 2 which would have never had the chance to show its potential.
It is clear that relying on the suggestion of a wrong method leads to a total campaign failure.

Solution

During the last 12 months we have been working on a proprietary algorithm which analyzes in detail each and every metric of a campaign. Based on every click in time, it spots trends and “learns” to squeeze the maximum ROI out of your campaigns. This means automatically blocking that bad publisher that is bleeding money when your are sleeping or stopping traffic to an offer / lander that suddenly stopped converting.
It requires very little amounts of traffic to start doing it’s job and on every click it continuously improves its decisions. And the best part of all is that it is tracker and traffic source agnostic. So you can start using it regardless if you run pops tracking with Voluum, native ads with funnelflux or display banners with Thrive.

If you are curious and want more on how it works just post here and we will get our math genius that built the algorithm to chime in .


04-11-2018 04:09 AM #2 erikgyepes (Moderator)

Interesting share, good points.

Anyway, wouldn't be the solution then just to take more fresh data into consideration? ie. not take the whole life time of both paths, but just take last x days (x hours), so the data is more close to the actual outcomes?


04-11-2018 07:55 AM #3 cmdeal (Veteran Member)

You make an interesting point, but the point you make is actually not the point that you think you are making.

In a basic A/B split test, the whole point is to test one dependent variable and one dependent variable only, and while keeping all other dependent variables constant. In the trade, this is called the ceteris paribus principle.

Here you are actually introducing a second variable which is temporal.

If you want to measure this temporal "velocity", then the output that you need to be measuring as your y dependent variable is conversion growth, and not raw conversions, and you need to make sure you are measuring geometric growth (multiplicative) and not simply arithmetic (additive).


04-11-2018 11:27 AM #4 antuen (Senior Member)

Affiliates are using naive Bayes cause no one is testing for independence of the variables. But actually there is a strong correlation of the variables, altering the results. We need to use numerical analysis combined with the statistical one (in this case binomial, or Poisson process). And with Poisson you don't need to analyse the "conversion growth" cause the changes are already predicted in the analysis...


04-11-2018 12:00 PM #5 platinum (Veteran Member)

Quote Originally Posted by erikgyepes View Post
Interesting share, good points.

Anyway, wouldn't be the solution then just to take more fresh data into consideration? ie. not take the whole life time of both paths, but just take last x days (x hours), so the data is more close to the actual outcomes?
Actually defining an accurate date range is a big challange itself. This because if we choose a really wide data range we fail to take in consideration the latest campaign/path trend updates (like the current performance). On the other hand, if we choose a really narrow date range, the data we base our decisions on may not be sufficient.

The best way would be to continuously evaluate campaign statistics where obviously the latest stats tend to have a higher influence in our optimization decisions.


04-12-2018 09:49 PM #6 johner911 (Member)

Which ML model do you use for learning ?

Why do you need a proprietary algorithm ? You would achieve the same with using sliding window for data ?
I don't think Bayesian failed in case outlined above, the thing that failed is your time window for analysis.

If data was analysed on a daily basis with a reserve % (eg 10%) traffic share to nonperformer.. you would get much better paths spliting.


04-13-2018 09:34 AM #7 platinum (Veteran Member)

Quote Originally Posted by cmdeal View Post
You make an interesting point, but the point you make is actually not the point that you think you are making.

In a basic A/B split test, the whole point is to test one dependent variable and one dependent variable only, and while keeping all other dependent variables constant. In the trade, this is called the ceteris paribus principle.

Here you are actually introducing a second variable which is temporal.

If you want to measure this temporal "velocity", then the output that you need to be measuring as your y dependent variable is conversion growth, and not raw conversions, and you need to make sure you are measuring geometric growth (multiplicative) and not simply arithmetic (additive).
Actually the above example aims to explain how and when the Bayesian method is misused.

It is true that in basic A/B split testing the main propose is to test one variable only and consider all other ones constant, but considering the dynamic nature of the data we deal with, results show that we can achieve more reliable results using better methods.

Another option when using A/B test calculator would be to adopt our data continuously, in which case we are exposed to the risk of losing some really important information of the data we are analyzing. In this case for example, Poisson process would be a better fit to our needs.

What we've done in the Optimizer is a math algorithm that takes into account and analyzes the changes of all campaign metrics over time. Performance over time is modeled as time series, weight change suggestions takes into account the changes on all other metrics and statistical models are used to evaluate the probability of success or failure as well as predict how performance will continue.

We suggest that a decision should not be made based on the analysis of just one variable, or some variables studied separately. As a result of our internal tests, we have concluded that optimization is greatly improved by relying on a series of mathematical analysis related to each other, where each of them plays its decision-making on a certain scale.


04-13-2018 10:16 AM #8 platinum (Veteran Member)

Quote Originally Posted by johner911 View Post
Which ML model do you use for learning ?

Why do you need a proprietary algorithm ? You would achieve the same with using sliding window for data ?
I don't think Bayesian failed in case outlined above, the thing that failed is your time window for analysis.

If data was analysed on a daily basis with a reserve % (eg 10%) traffic share to nonperformer.. you would get much better paths spliting.
Using a sliding window for the data we have may be an option, but that would distort the data context of a campaigns overall performance, in which case may lead us to the wrong path.

This is just a simple example where the graphs show what is happening with the campaign performance on a day-to-day basis. Most affiliates tend to use the A/B test calculator as is, without trying to get a deeper understanding on what is happening with the campaign performance.

Can we say for sure that 1, 3 or 7 days’ data range has the most reliable stats based on which we can run the A/B test analysis? It's quite challenging, right? As such, the best way would be to consider each campaign metric and analyze it over the full set of statistics, including granular trends.


Home > General > Affiliate Marketing Forum