Over the last couple of weeks, based on this article about detecting headless browsers and this linked article about detecting PhantomJS, I've been doing a bunch of research into bot traffic.
What started out as a "let's write a better bot blocker" project turned into more of an open-ended investigation into the kind of bot traffic we see on popular traffic sources.
What's the code behind fraudulent traffic? How sophisticated is it? And how can you detect or block it?
Even in this early-stage research, which was conducted on a single popular traffic source, I found some pretty surprising results. I initially tested using very low bids in a few countries including the USA, which I know from experience is a good way to get lots and lots of bot placements. Subsequently I verified my tests using a very high bid on premium traffic.
All tests were conducted on wifi traffic to reduce false positives from click loss, and because wifi traffic tends to have more bots.
TL: DR Summary
curl -A "UserAgentString" http://url.com
Thanks so much @caurmen, very valuable stuff! Good to know the bots are still dumb, for non-premium placements at least.
TBH I'd love to find some less-dumb bots in the wild so I could study them! Any suggestions for places (placements etc) to look gratefully received 
Have a look at this: https://www.whiteops.com/hubfs/Resou...eration_WP.pdf
I've personally seen various very advanced bots around, none like the one above though (They even created their own HTTP Library)
There are probably more of these types of bots out in the wild..
@thethrone - Very interesting, thanks!
Where have you seen those more advanced bots, if you don't mind me asking? I'm wondering if they're more common on some forms of traffic source than others.
@Caurmen
I used to do Consultancy work for a Cyber Security company, most of what i found were sophisticated Traffic Generating bots, some cloaked affiliate links and cookie stuffing. Others would just sell fake traffic on shady Russian traffic exchanges etc..
Most of it was targeting: Google, Youtube, Instagram & Then a broad range of "fake/hacked" blogs & websites
I will see if they still have some of the research available, then i can send you a detailed list 
Interesting thread and very useful to try to stop bots from different traffic sources. This brings me to something I read not long ago in another forum in Spanish. I think the user is also a member of STM and he is very helpful giving advices to other people. Basically the system is a first test to eliminate bad placements that are sending bot traffic. The example he gave was using popads but it could also work with other networks.
I haven’t tried this method yet, but maybe you already know what it’s all about (maybe it doesn’t even work anymore). The idea is create a campaign using the “PrimeSpot only” selection.

We also have to bid very low in order to get the traffic that “nobody wants”. This means that we are getting the placements where the bot traffic is coming from and therefore we can use the “Exclude Websites” after running this campaign for about three hours. The budget is up to you but the more invested the more bad placements that will be found. In fact, in the method that I read in this Spanish forum we can repeat the process several times increasing the bid and ruling out more bot placements.
Once we have a big list of excluded Website IDs, we can create a new campaign including the list and start bidding high, as we would usually do. The risk of this is that we can rule out good placements. As I said, this is not my method and I don’t want to get credit for it. I will be trying this system in my next campaign, so I hope it works. What do you think?
I'd recommend pairing that approach with some kind of bot detection if you're going to do it.
Even very low-bid placements, in my testing, sometimes appeared to have quite high rates of non-bot traffic. Of course there's a variety of other things that could be going on to make them less valuable, but I'd still be cautious about a blanket "exclude all cheap placements" approach.
This is another Bot, with a very specific purpose: https://krebsonsecurity.com/2017/08/...-intimidation/
Great case study Caurmen!
One question though I've been asking myself.
Why is it so important to detect bots if at the end of the day you just have two options:
1 -If a publisher is profitable, no matter if bot or no, keep it active
2- If a publisher is losing money, no matter if it has 0% bot traffic, have to pause it.
So I think a better way to optimize publishers is just to focus on cost vs revenue.
The only scenario I would see bot detection being useful is on sources you can actually ask for refunds.
@bbrock32 - that's a great question, and one I've thought about a lot.
To my mind, there are two really valuable uses of bot detection:
1) Early detection of placements that are very unlikely (or worse) to convert profitably because of the amount of bot traffic. Bot testing is ridiculously fast and cheap compared to the conventional approach of waiting for statistical significance on a placement, so it can be a huge money-saver.
For example: if your team is testing a source/geo combination that currently has 1000 placements, and 100 of those are so high-bot-traffic that they're spectacularly unlikely to ever generate positive ROI (say, above 85% bot traffic at the same bid as everything else), it'll cost around 200x - 300x your payout to filter all of those out by waiting for conversion results. However, your team can get solid bot testing results with 100 or so impressions on that placement, which (broad strokes here) is likely to be between 1/8th and 1/16th of the cost. (Depending on your methodology you could go even lower than that, but I'm being conservative here and assuming that the bot traffic isn't 100% consistent.)
That's a pretty big money saver any time you're testing a new geo or traffic source, and it's still useful on a maintainance-mode campaign if you can run the bot test with minimal ROI drop, because of the daily addition of new fraudulent placements to the exchanges you're using.
Heck, depending on the traffic source, it's even worth running a test like this just to filter out the 100% bot placements. They're definitely not going to convert, and you can run the test with even less traffic. If a $10 spend helps you eliminate 30 high-traffic placements that are just pure 100% fraud right off the bat, that's a pretty good investment.
2) Knowing what percentage of bot traffic your high-value placements have can be really valuable for further optimisation, as it gives you much more accurate information on what's really going on down there.
If you know, for example, that you've got a huge volume, borderline-profitable placement with 60% bot traffic, you have information that you wouldn't have otherwise: you know that placement's human visitors are super-high-value. At that point, you can start looking for ways to eliminate the bot traffic from your bidding. Are there useragents it always uses or never uses? Does it ebb and flow by time of day? Can you narrow down the IP block that the bots are coming from then eliminate that from your bidding? If you're buying at a large enough scale (as I know you are
) you can potentially even talk to the traffic source directly about incorporating better bot filtering into their technology.
Likewise, you can have your team look at the placement elsewhere and see if it's still producing that much bot traffic. Perhaps it's available on a bunch of other traffic sources but some of them have better bot filtering (and that implies that their other placements are better filtered too). Or they can check its bot percentage in other geos - I've noticed that bot traffic on placements varies widely by geo. You may be able to find another geo where that high-quality traffic has a lot less bots mixed in.
Finally, as you do more testing and run more campaigns, knowing what bot percentage each placement/campaign/source was running is a good way to further optimise. You may, for example, be able to conclude that 99% of placements with more than 40% bot traffic just don't hit profitability on Traffic Source A - and that may well be the only identifying factor all those placements have in common. You can tell your team to bot test and proactively eliminate those whenever they run a new campaign, giving you a close-to-unbeatable competitive advantage over someone coming in naively who doesn't have that information and thus has to eliminate those placements the expensive way.
(A minor additional note on this - I've seen some success in the past identifying huge placements that absolutely won't convert, and that barring a sophisticated test I'd just have assumed were bot-ridden hellholes and auto-blocked. I recall Grindr was like this: on general campaigns it converted horrifically, but bot-testing it revealed it had an almost zero bot count. Low-priced traffic (because it doesn't convert easily), huge volume, and very high quality in terms of being actual humans - it was a super-valuable source if you targeted campaigns directly to it. But without a bot test, it just looks like another bot placement.)
---
Other than those two uses (and the third use you mention of "ask for a refund", which in my experience is easier the more solid and technically-grounded your bot test is), I'd agree a lot of people get too hung up on bots. There are going to be bots in your traffic - it's the cost of doing business in 2017. If the placement's profitable despite the fact half the clicks you're paying for are just curl on a loop, it's still a profitable placement.
I think of bot testing as being just like any other source of tracking information: it gathers information, and that's it. Sometimes - in fact, quite often - you can use that information to predict behaviour, and make cutting decisions based on that.
Hello, excuse me, I would like to ask a question, almost 2020, do you have a better way to filter the bot traffic? Thank you!
Several trackers now have this incorporated, so you can check the level of suspicious traffic directly in the tracer and break it down by placement...
Do you happen to use any of these ones?
https://stmforum.com/forum/showthrea...-fraud-traffic
or pm me for a demo
Aksana,
skype live:a.rudovich