Home > The Newbie Zone > Questions and Answers

How to analyze those LPs stats ? What to cut ? (14)


06-24-2017 05:37 PM #1 greedy (Member)
How to analyze those LPs stats ? What to cut ?

Hi,

I just open this topic because I don't know how to optimize those LPs after analyzed the stats.

Let me explain :

On the 16th, I added a new LP to my tests, the LP 7, I added it on 2 campaigns, but they are actually the same campaign (same offer, same LPs, same targeting) on the same traffic source, except that one is a Whitelist campaign, and the other is RON (without the placements from the Whitelist).

Here are the stats from 16th to now for the Whitelist campaign :



As you can see, the LP 7 is performing a bit better, but it's not signifiant at all.

Now let's see the same stats but from 18th to now :



So here I can see that LP 7 really performs better, and I can cut LP 6.

But should I believe stats from 16th to now or from 18th to now ?

And now, let's see the stats for the RON campaign from 16th to now :



So now the LP 6 is performing better !

---

So if I look the Whitelist campaign's stats from 16th to now, it says that I should keep running.

If I look the Whitelist campaign's stats from 18th to now, it says that I shoud cut LP 6.

If I look the RON campaign's stats from 16th (and also from 18th) to now, it says that LP 6 is the BEST !

---

I'm totally lost about those stats... Should I cut LP 6 on the Whitelist campaign and cut LP 2 on the RON campaign ?

Should I treat those 2 campaigns as one campaign and only look at the global stats ?

Thanks in advance for your help.


06-24-2017 06:07 PM #2 vortex (Senior Moderator)

I understand your pain.

You'll get different results on different days, from different placements, different OSs etc.

When it comes to campaign optimization, I find that I'm always having to bring up the concept of accuracy vs. efficiency. These are like yin-yang where when you have more of one, you'll have less of the other.

If you want results to be highly accurate, you could test and cut landers for every set of placements (or even every placement), every OS etc., and only make a cutting decision after collecting at least one week's worth of data. But that would be unnecessarily expensive for most purposes (especially given how short-lived most pop camps are, striving for high accuracy may not be worth it).

So, you'll just need to find a sweet spot between being 100% accurate and 100% efficient. I would normally just use RON traffic and cut landers when stats have reached 0-5% (or 95-100% for the best lander) "probability of being best", and leave it at that. Yes I'm aware that the "best" lander I identify that way, may not STILL be the best lander for a subset of the placements (e.g. a whitelist of placements) or a particular OS, or even for another day of the week. But if I test enough landers, chances are that in the long run and generally speaking, the winner would still have a better chance of ending up in profits, compared to NOT using stats tools.

This is also why, in spite of my efficient (aka lazy) nature, I still insist on using stats calculators to cut landers and offers. Even cutting at statistical significance will not guarantee a high degree of accuracy - so imagine how much more unreliable your "best" lander would be if you don't wait for stat sig!

Now - back to your stats. Since you already have stats for both RON and your whitelist camp, I would suggest to just cut landers for each camp individually because that would be more accurate. It's more expensive (i.e. less efficient) to cut this way, but you get more accuracy. As you've already included the new lander 7 in both camps and collected stats, take advantage of them. Next time though, you'll have 2 options when it comes to testing new landers: You could just add the new lander(s) to the RON camp, find the best, and apply the same winning lander to the whitelist camp - which would be the less accurate but more efficient (cheaper) way. Or you could do the same as what you did this time: Test the new lander(s) on both RON and whitelist camps and cut separately - which would be the more accurate but less efficient (more expensive) way. The choice is yours to make.

As for timeframe, generally speaking the longer the time period your stats are collected over, the better. But again, having to collect more stats over longer periods of time would be less efficient.

Hope I haven't bored you with my stats talk!



Amy


06-24-2017 10:37 PM #3 osmiumman (Member)

Use an A/B significance calculator and wait until you get 95%. Thats not the case yet for example with your RON campaign.

PS: Nice ROI, congrats!


06-26-2017 12:40 AM #4 vortex (Senior Moderator)

Quote Originally Posted by osmiumman View Post
Use an A/B significance calculator and wait until you get 95%. Thats not the case yet for example with your RON campaign.

PS: Nice ROI, congrats!
Ooh nice catch! It's such a fundamental concept, that I forgot to mention it altogether!

This is the method to use for cutting landers. 90% "probability of being best" is the minimum suggested cut-off. 95-100% preferred.

https://stmforum.com/forum/showthrea...Banners-Part-1



Amy


06-26-2017 06:15 AM #5 greedy (Member)

Quote Originally Posted by vortex View Post
I understand your pain.

You'll get different results on different days, from different placements, different OSs etc.

When it comes to campaign optimization, I find that I'm always having to bring up the concept of accuracy vs. efficiency. These are like yin-yang where when you have more of one, you'll have less of the other.

If you want results to be highly accurate, you could test and cut landers for every set of placements (or even every placement), every OS etc., and only make a cutting decision after collecting at least one week's worth of data. But that would be unnecessarily expensive for most purposes (especially given how short-lived most pop camps are, striving for high accuracy may not be worth it).

So, you'll just need to find a sweet spot between being 100% accurate and 100% efficient. I would normally just use RON traffic and cut landers when stats have reached 0-5% (or 95-100% for the best lander) "probability of being best", and leave it at that. Yes I'm aware that the "best" lander I identify that way, may not STILL be the best lander for a subset of the placements (e.g. a whitelist of placements) or a particular OS, or even for another day of the week. But if I test enough landers, chances are that in the long run and generally speaking, the winner would still have a better chance of ending up in profits, compared to NOT using stats tools.

This is also why, in spite of my efficient (aka lazy) nature, I still insist on using stats calculators to cut landers and offers. Even cutting at statistical significance will not guarantee a high degree of accuracy - so imagine how much more unreliable your "best" lander would be if you don't wait for stat sig!

Now - back to your stats. Since you already have stats for both RON and your whitelist camp, I would suggest to just cut landers for each camp individually because that would be more accurate. It's more expensive (i.e. less efficient) to cut this way, but you get more accuracy. As you've already included the new lander 7 in both camps and collected stats, take advantage of them. Next time though, you'll have 2 options when it comes to testing new landers: You could just add the new lander(s) to the RON camp, find the best, and apply the same winning lander to the whitelist camp - which would be the less accurate but more efficient (cheaper) way. Or you could do the same as what you did this time: Test the new lander(s) on both RON and whitelist camps and cut separately - which would be the more accurate but less efficient (more expensive) way. The choice is yours to make.

As for timeframe, generally speaking the longer the time period your stats are collected over, the better. But again, having to collect more stats over longer periods of time would be less efficient.

Hope I haven't bored you with my stats talk!



Amy
Wow, absolutely awesome ! I didn't except a so fast and complete answer, thanks so much Amy !

I've directly read your answer when you posted it but I only answer now because,I have so admit it, I've read your message around 10 times.

Now I understand what you mean, and I see things more clearly even if it's hard to understand it.

So now here are the stats from 17th to now :

WhiteList campaign :



> So here the LP 2 is the best, but only 79% against the worst (LP 6), so I keep running the 3 LPs until it reachs stats sig.

RON campaign :



> Here LP 2 is the worst ! For me it's unbelievable that from the best on the WL campaign it becomes the worst on the RON campaign... Anyway this LP 2 has only 7% against the best one (LP 6). So I cut LP 2 now...

But if I look the stats from 16th to now (1 day more), LP 2 has 19% against LP 6. But I decide to cut it anyway, I hope I'm not wrong.

Quote Originally Posted by osmiumman View Post
Use an A/B significance calculator and wait until you get 95%. Thats not the case yet for example with your RON campaign.

PS: Nice ROI, congrats!
Yes I always use PeakConversion for the LPs and Win Vector for the offers.

But usually I wait it reachs 90%, not 95%, am I wrong ?

Thanks a lot !


06-26-2017 07:28 PM #6 vortex (Senior Moderator)

> Here LP 2 is the worst ! For me it's unbelievable that from the best on the WL campaign it becomes the worst on the RON campaign... Anyway this LP 2 has only 7% against the best one (LP 6). So I cut LP 2 now...

But if I look the stats from 16th to now (1 day more), LP 2 has 19% against LP 6. But I decide to cut it anyway, I hope I'm not wrong.

But usually I wait it reachs 90%, not 95%, am I wrong ?
Thanks for taking the time to understand the concept! Knowing what to cut is not as straight-forward as many people may think.

Let's put it this way: 90% vs. 95% vs. 100% - you can cut at ANY percentage you want really. Going back to the Accuracy vs. Efficiency concept, there isn't a 100% right way or wrong way of doing things.

If you cut at 100%, you'll be more accurate (i.e. it would be more likely for your winning lander to perform the best in the long run), but you will need to spend more money to reach stat sig.

If you cut at 90%, you'll be less accurate (i.e. it would be less likely for your winning lander to perform the best in the long run), but you'll spend less money to reach stat sig.

I used to cut at 90%, but after more thinking and also at the suggestion of matuloo, I decided that it would be worth the money to cut at 95-100%, just because landers are such a crucial part of the campaign.

But this is my personal belief only. I don't have stats to support that the additional accuracy is worth the money - in order for me to prove this, I would have to run at least 10 lander tests, 2 camps per test, both running the same landers - one camp I would cut at 90% and the other at 100%. Then see which camp does better. And nobody has this sort of time (least of all me).

All I know is that when I cut at 95-100%, it makes me feel more confident about the results, and things are working out, so I'm sticking to that.

All I can advise is to NOT cut below 90% - because my teacher caurmen advises not to, and he knows his stats.



It IS annoying that you're seeing opposite trends in your RON camp vs. your whitelist camp, where LP 2 is the best in one but the worst in another. But really, I would advise to just focus on the bigger picture: This is only one campaign out of the many you will be running. Just do your reasonable best, and move on.

Here's the big picture: You don't need to be right every single time when trying to identify the best lander of the bunch. You only need to be right as often as possible without breaking the bank - and using stats calculators WILL help you to do that.



Having said THAT - here are some possible causes of the contradiction (of your LP 2 performing the best in one camp but the worst in the other):

-The nature/topic of the placements in both camps may be different. For example let's say you're running a sweeps offer, and LP 2 is a spinning wheel or slots lander, and the placements in the whitelist camp are mostly gaming or gambling-related sites, then that would explain the better performance.

-The OS/Browser/Device/etc. makeup of the 2 camps may be different. This is actually something you can check and compare. Landers will often perform differently for different OSs and browsers. So if for example, LP2 does better on Android, and your RON camp has a smaller percentage of Android traffic than your whitelist camp does, then this could happen.

Note: For big campaigns, it may be worth the extra effort to cut landers (and offers!) for each OS and/or browser separately. (Tip: Use browserstack to confirm that the lander looks good on/in all popular OSs and browsers - sometimes the difference in performance is caused by the lander not being displayed correctly in certain browsers.)



In the end, it all goes back to the Accuracy vs. Efficiency concept. There are lots of things you can do to cut more accurately, but the more accurate you are, the more time and money you need to spend (i.e. you sacrifice on efficiency). So if you want extra accuracy, make sure the extra effort and cost is justified, e.g. when you're running in a big geo that have scaling potential.



Amy


06-28-2017 04:11 AM #7 greedy (Member)

Quote Originally Posted by vortex View Post
Thanks for taking the time to understand the concept! Knowing what to cut is not as straight-forward as many people may think.

Let's put it this way: 90% vs. 95% vs. 100% - you can cut at ANY percentage you want really. Going back to the Accuracy vs. Efficiency concept, there isn't a 100% right way or wrong way of doing things.

If you cut at 100%, you'll be more accurate (i.e. it would be more likely for your winning lander to perform the best in the long run), but you will need to spend more money to reach stat sig.

If you cut at 90%, you'll be less accurate (i.e. it would be less likely for your winning lander to perform the best in the long run), but you'll spend less money to reach stat sig.

I used to cut at 90%, but after more thinking and also at the suggestion of matuloo, I decided that it would be worth the money to cut at 95-100%, just because landers are such a crucial part of the campaign.

But this is my personal belief only. I don't have stats to support that the additional accuracy is worth the money - in order for me to prove this, I would have to run at least 10 lander tests, 2 camps per test, both running the same landers - one camp I would cut at 90% and the other at 100%. Then see which camp does better. And nobody has this sort of time (least of all me).

All I know is that when I cut at 95-100%, it makes me feel more confident about the results, and things are working out, so I'm sticking to that.

All I can advise is to NOT cut below 90% - because my teacher caurmen advises not to, and he knows his stats.



It IS annoying that you're seeing opposite trends in your RON camp vs. your whitelist camp, where LP 2 is the best in one but the worst in another. But really, I would advise to just focus on the bigger picture: This is only one campaign out of the many you will be running. Just do your reasonable best, and move on.

Here's the big picture: You don't need to be right every single time when trying to identify the best lander of the bunch. You only need to be right as often as possible without breaking the bank - and using stats calculators WILL help you to do that.



Having said THAT - here are some possible causes of the contradiction (of your LP 2 performing the best in one camp but the worst in the other):

-The nature/topic of the placements in both camps may be different. For example let's say you're running a sweeps offer, and LP 2 is a spinning wheel or slots lander, and the placements in the whitelist camp are mostly gaming or gambling-related sites, then that would explain the better performance.

-The OS/Browser/Device/etc. makeup of the 2 camps may be different. This is actually something you can check and compare. Landers will often perform differently for different OSs and browsers. So if for example, LP2 does better on Android, and your RON camp has a smaller percentage of Android traffic than your whitelist camp does, then this could happen.

Note: For big campaigns, it may be worth the extra effort to cut landers (and offers!) for each OS and/or browser separately. (Tip: Use browserstack to confirm that the lander looks good on/in all popular OSs and browsers - sometimes the difference in performance is caused by the lander not being displayed correctly in certain browsers.)



In the end, it all goes back to the Accuracy vs. Efficiency concept. There are lots of things you can do to cut more accurately, but the more accurate you are, the more time and money you need to spend (i.e. you sacrifice on efficiency). So if you want extra accuracy, make sure the extra effort and cost is justified, e.g. when you're running in a big geo that have scaling potential.



Amy
Thanks again Amy for all your awesome informations !

I'll change my method to cut around 95%, not 90%

About the contradiction between the LP 2 results on both campaigns, it's really the same campaign with exactly the same targeting (OS, Browsers etc.) EXCEPT the placements.

Actually even the bid is the same those days.

So I barely understand why the results are different.

And even the LPs are almost all the same, just one text (headline) is different.


06-28-2017 09:19 PM #8 vortex (Senior Moderator)

In that case, then I'm leaning towards this as a possible cause:

-The nature/topic of the placements in both camps may be different. For example let's say you're running a sweeps offer, and LP 2 is a spinning wheel or slots lander, and the placements in the whitelist camp are mostly gaming or gambling-related sites, then that would explain the better performance.
Either way - the key is to just do your best and move on. This is only one of your many campaigns (that you'll be launching in the future). It may not be worth it to try to get to the bottom of what's actually causing the contradiction.



Amy


06-29-2017 07:46 AM #9 osmiumman (Member)

Quote Originally Posted by greedy View Post
About the contradiction between the LP 2 results on both campaigns, it's really the same campaign with exactly the same targeting (OS, Browsers etc.) EXCEPT the placements.
There's usually a reason why some counter-intuitive results happen. My guess is that the result just happened by chance. Of course, Amy could be right as well of course.
Why not continue the test for another 1-2 weeks?

You might also want to read a bit about multi-armed bandit tests. This is what Google does in Adwords. As soon as first results are there, the better ad is getting a bit more traffic. So they don't wait for 95% statistical significance. This way, they "waste" less traffic on probably worse performing ads.

While this is complicated stuff to implement, you could use a simpler version that also many other affiliates use: give the "probably" winning LP (but not yet statistically significant) 80% of the traffic, and the other one 20%.


06-30-2017 07:42 PM #10 vortex (Senior Moderator)

Quote Originally Posted by osmiumman View Post
There's usually a reason why some counter-intuitive results happen. My guess is that the result just happened by chance. Of course, Amy could be right as well of course.
Why not continue the test for another 1-2 weeks?

You might also want to read a bit about multi-armed bandit tests. This is what Google does in Adwords. As soon as first results are there, the better ad is getting a bit more traffic. So they don't wait for 95% statistical significance. This way, they "waste" less traffic on probably worse performing ads.

While this is complicated stuff to implement, you could use a simpler version that also many other affiliates use: give the "probably" winning LP (but not yet statistically significant) 80% of the traffic, and the other one 20%.
Neat concept!

Some ad networks have this built into their algorithms. And if my memory isn't so bad, I'd remember my experiences with this approach...

But in the end though, I'd strongly suggest to still wait for at least 90% to be reached before actually making a cutting decision.

Makes me wonder though: What if the worse candidate just happens to "appear" to perform well in the beginning - for example bam it gets 2 "lottery" conversions - and then we assign more traffic to it - would that end up delaying the 90% or 95% stat sig to be reached, and therefore wasting even more money?

I guess it depends on how often such exceptions will occur. Yours certainly sounds like a sensible approach!



Amy


07-01-2017 02:55 PM #11 greedy (Member)

Just to let you know, after let the LPs run a little bit more time on my RON campaign, the LP 7 has now exactly the same stats as the LP 6. So I'll follow your rules Amy and wait for 95% stats sig to be reached.

Now I have another question for another campaign.

It's about offer landing page :

Some hours ago I added another landing page from the same offer, if I look the stats since I added it, each LP received around 500 visits, I spent around $14 on each, and the offer payout is around $4-$6 (payout is lower on the new LP).

The original LP did 6 conversions (~$20 profit), and the new LP did only 1 conversion (~$10 loss).

If I look the stats on Win Vector, it says that the original LP has 99% probability of being the best.

But my question is : is-it too early to cut it ? I mean, $14 is only 3,5 times the payout, so is-it enough to judge it ?

Thanks in advance for your help. :$


07-02-2017 07:36 PM #12 vortex (Senior Moderator)

Just to let you know, after let the LPs run a little bit more time on my RON campaign, the LP 7 has now exactly the same stats as the LP 6. So I'll follow your rules Amy and wait for 95% stats sig to be reached.
Haha nice to hear! There are so many variables that can influence stats, such that a higher degree of accuracy would be good for important stuff like offers and landers.


Some hours ago I added another landing page from the same offer, if I look the stats since I added it, each LP received around 500 visits, I spent around $14 on each, and the offer payout is around $4-$6 (payout is lower on the new LP).

The original LP did 6 conversions (~$20 profit), and the new LP did only 1 conversion (~$10 loss).

If I look the stats on Win Vector, it says that the original LP has 99% probability of being the best.

But my question is : is-it too early to cut it ? I mean, $14 is only 3,5 times the payout, so is-it enough to judge it ?
Let me guess: Did you compare stats since the start of the campaign?

Every time you add a new lander, you're effectively starting a new round of split-testing. When you compare lander stats after that point, you'll need to look at the period starting from the time you added the new lander(s).

There will be hour-to-hour and day-to-day fluctuations in performance. So if you take the stats from an old LP and compare to stats of a new LP, it wouldn't be a fair comparison.

Do the comparison again and let me know how it goes!




Amy


07-03-2017 03:02 AM #13 greedy (Member)

Quote Originally Posted by vortex View Post
Let me guess: Did you compare stats since the start of the campaign?

Every time you add a new lander, you're effectively starting a new round of split-testing. When you compare lander stats after that point, you'll need to look at the period starting from the time you added the new lander(s).

There will be hour-to-hour and day-to-day fluctuations in performance. So if you take the stats from an old LP and compare to stats of a new LP, it wouldn't be a fair comparison.

Do the comparison again and let me know how it goes!
Hi Amy,

No no I compared the stats from the hour I added the new offer, not from the start of the campaign.

But I didn't know if I need to cut after spent only 3,5 times the average payout ?

Finally after posted my question, I kept it running and here are the stats now :



So the "Offer 2" it's this offer I was talking about, and it's the only one negative.

And here are the stats day by day for Offer 2 only since I added it :



From the first screenshot, Win Vector says that Offer 2 has only 4% against Offer 1, so I just cut it.

But for yesterday, it was not the worst offer, Offer 2 was better than Offer 3, it's really confusing.


07-03-2017 08:59 PM #14 vortex (Senior Moderator)

AH....

Apologies! I re-read your post and understand what the situation is now - I need to read more slowly next time!

The peakconversion calculator does not take payout into account, because it's not relevant for the calculation - all it needs it know to compare 2 things, are the number of impressions and conversions. So you don't need to worry whether there were enough impressions for the decision to be accurate - if the calculator says that something only has <10% probability of being best, then that's the verdict. And that means the sample size is large enough for it to come to this verdict.

So in short, we don't need to worry about sample size. The calculator takes that into account.



Having said THAT - there are 2 things the calculator cannot know:

1)Whether there are delayed conversions. Sometimes a visitor will have the offer page open but not finish conversion until a day or two later. Or the aff network doesn't post back the conversions until a few hours later. You can still cut offers as the calculator tells you to, but keep an eye on the cut offer afterwards to make sure it doesn't all of a sudden get a few more conversions. (And if it does, you'd need to run the calculator again, and retest the offer if necessary.)

2)A general volatility in campaign performance. It's true that the volatility will be experienced by all offers somewhat equally because you're rotating them evenly, but this volatility will still skew the stats somewhat, and there's little else we can do about that. One thing that WILL help would be to run the offers test over longer periods, so for example throttle the traffic so that you collect data over 3 days instead of 1 before deciding which offers to cut. Another thing that will help, would be to test more offers at one time - this way you don't need to throttle the traffic, as each offer will get only a fraction of the traffic each day, and the testing will be drawn out over a longer time period naturally. This is yet another reason why I like to mass-test offers.

So - the different verdicts you've described, when comparing the offers in 2 different timeframes, are caused mainly by the 2 reasons I listed above.



Again, I need to mention the "accuracy vs. efficiency" concept here:

1)You can test fewer offers and/or collect stats over a short period of time (e.g. 1 day) and base cutting decisions on those. This is the more efficient (time-saving) approach, but has less accuracy.

OR

2)You can test more offers and/or collect stats over a longer period of time (e.g. 3+ days) and base cutting decisions on those. This is the less efficient (takes more time) approach, but is more accurate.

And it's up to you which approach you prefer.




Amy


Home > The Newbie Zone > Questions and Answers