Home > Tracking Campaigns > Tutorials, Tips and Guides

How To SEEK AND DESTROY Bot Traffic To Your Campaigns (36)

09-26-2014 12:27 PM #1 caurmen (Administrator)
How To SEEK AND DESTROY Bot Traffic To Your Campaigns

Ever had this experience with a campaign?

You've got a single placement that sends tons of traffic, but it doesn't convert. So you optimise. You test. You hone.

And nothing works.

Until you realise - it's very likely that your visitors aren't... like other people.

Yep, they're bots. But how can we spot them and kill them before they start eating our testing budget?

The Theory Of Bot Detection

At the simplest level, a bot is just a computer running a very basic program to download web pages. It sits there refreshing a single page and "clicks on" (ie downloads) the banner links that appear each time.

A bot that hasn't been written in a sophisticated fashion to evade detection - as a surprisingly large number of clickbots aren't - has a number of obvious "tells":

Many bots don't bother to parse the web pages they download, or run the Javascript on those pages. Very few humans browse without Javascript now, but a lot of bots do.
Humans have to take time to read a page, or at least skim it - bots don't.
Likewise, they don't have human reaction times.

So this week I started to use these characteristics to see if I could detect bots on mainstream mobile DSP traffic - and the testing bore fruit fairly rapidly.

I was able to use a couple of basic Javascript-based approaches to differentiate some known good placements from some placements that I knew independently were bot-filled trash.

Here's a shot of the IPs coming through from one of the placements that I detected as bot-tastic - and note that I detected it without testing or following IP addresses:

So here are two ways to use the techniques I've been testing to find out if you're being attacked by the bots too!

Note: some bots are smarter than others. There are bots out there that use Javascript, mimic human behaviour, and are much harder to detect. These tests won't find them - but they will find dumber bots, of which there are a lot.

1: Find out if your live lander is getting bot traffic

This is a really simple test to see if the traffic that's landing on one of your landers is Full Of Bots. It won't slow your page down and shouldn't affect conversion or clickthrough rates at all.

On the downside, it doesn't give you tracking information - it'll just tell you if your lander's getting hit by bots, not which areas of traffic they're coming from.

Instructions

Create a 1x1 transparent gif and upload it to your server or the (high-speed) image site of your choice.
Go to Bit.ly and create an account, then create a shortened link to that image. You need to create an account on Bit.ly to get their statistics.
Add Code Section 1 (below) to your lander, just before the body tag. Replace YOUR BITLY LINK with your bit.ly link from #2
Add Code Section 2 (below) to your body tag.
Add Code Section 3 (below) to the bottom of your lander.
Before you upload your modified lander, make a note of how many visitors that landing page has had today.
Upload and run traffic to your lander until you've had 100 more visitors to that lander. Pause traffic.
Go to your Bit.ly link and note how many times that link has been hit.

What does this mean? I'll explain below. First, the code:

Code Block 1

Code:

<script type='text/javascript'>
function changeimage(){
    document.getElementById("myimage").src="YOUR BITLY LINK";
}
</script>

Code Block 2

Code:

onload="setTimeout(changeimage, 300);"

Code Block 3

Code:

<img src="broken" id="myimage">

Interpreting your results

This test loads an image using Javascript after the page has been loaded for 300ms. Human reaction times average around 250ms. The odds of a human being able to close a browser window in less than 300ms or so are pretty small - very small on a mobile device. Likewise, only about 1% of users have Javascript disabled.

As a result, if you've got a landing page load but the image has not been loaded, the chances are that the visitor was a bot.

That makes these results pretty easy to interpret. Check the number of visitors your tracker noted on the lander (it will be around 100 if you followed the instructions above). Now check the number of image loads that the Bit.ly link shows.

If the Bit.ly number is significantly less than the tracker number, the difference between the two is likely to be the number of bots you're seeing.

Unfortunately, there's no super-easy way to associate this data with your tracker data - so if we want to start drilling down and seeing what placements have bot issues without doing custom coding on a tracker, we have to use a different approach...

2: Find Bot-Free Placements, Carriers, etc

This one can't easily be run on a live campaign. It's more of a prospecting approach on a new traffic source or targeting to see which targets are just worth cutting from the outset. You can still run an offer with it, but it has to be direct-linked.

Copy the code below (Code Block 4) into a new text file. Replace "YOUR_TRACKER_CTA_LINK" with your tracker's CTA link as you'd usually put it in a landing page.
Save that as "prospect.html" or something similar, and upload it to your landing page hosting.
Set up a campaign on the traffic source you wish to test, using whatever offer you like, and using "prospect.html" as the only landing page.
Set the campaign live, and watch your stats like a hawk. As soon as any placement hits 60 clicks, pause it. Keep doing this until you either run out of traffic or have tested a good 20+ placements.

What are we doing here? Basically, we're gathering enough data on each placement to tell if it's bot-ridden or not. That's why we pause after 60 clicks - because we have enough data then, and any more is wasted.

How to interpret it? Coming right after the code block.

Code Block 4

Code:

<head>

<script type='text/javascript'>
function redir(){
        window.location.replace('YOUR_TRACKER_CTA_LINK');
    }
</script>
</head>
<body onload="setTimeout(redir, 300);">
</body>

Interpreting Your Results

It's all in the CTR here. What this script does is really simple: it uses Javascript to automatically redirect visitors to your landing page out script after 300ms. Thus, humans (who can run Javascript and don't react in less than 300ms) will be recorded as clickthroughs, and bots won't. '

Thus, any visitors who appear to have clicked through have been redirected, meaning that they're probably not bots. And any who don't click probably are!

Here are some example results - the green-highlighted placements are the ones that are probably not receiving much bot traffic:

(Note - some of the numbers here are rather larger than they should be, because I got distracted half-way through running the test and a burst of traffic caught me by surprise.)

If you want to check bot populations for carriers, countries, phone models or anything else that you can track, you can do it using the same approach - just pause based on the thing you're testing. In my tests, I found that the most strongly-differentiated bot vs no bot areas were placements, but your mileage may vary!

What should I do if I'm getting bot traffic?

Well, first thing would obviously be to pause those placements / carriers / whatever.

Some placements seem to get a mixture of dodgy and legitimate traffic - they'll be harder to profit on, but may not be impossible. I'm currently working with a rule of auto-cutting anything with less than 80% human traffic, but because these techniques are relatively new to me, I'm probably going to refine that more over time.

If you've had a significant amount of traffic from a bot, and particularly if you can also show that traffic is dodgy via IP logs too (always check IPs coming in from any traffic that's failing this detection rule), you should also get in touch with your traffic source and ask for a refund. You are, after all, paying for traffic from humans. In conversation with MrGreen on this topic recently, he mentioned that he'd had 5-figure refunds because of bot traffic in the past - this stuff can really add up.

And that's it! Good luck in your bot hunting, and if you have any questions, suggestions, thoughts on the methods I'm using or other comments, post 'em below!

09-26-2014 12:44 PM #2 cmdeal (Veteran Member)

Originally Posted by caurmen

If you've had a significant amount of traffic from a bot, and particularly if you can also show that traffic is dodgy via IP logs too (always check IPs coming in from any traffic that's failing this detection rule), you should also get in touch with your traffic source and ask for a refund. You are, after all, paying for traffic from humans. In conversation with MrGreen on this topic recently, he mentioned that he'd had 5-figure refunds because of bot traffic in the past - this stuff can really add up.

We have even had six figure refunds for fake traffic.

To be sure, we had to fight for it and even get the lawyers involved, but the evidence was pretty clear, and so the other side backed down before it went to court.

Awesome case study!

09-26-2014 02:14 PM #3 supeyrio (Member)

how are companies getting away with shady things like this?
isnt this like outright cheating/scam and that anyone who's involved/approve of the setup are basically doing things against the law?
i'm not trying to be on the morale high ground, just find it interesting that "legit" setups can convince their employees to do shady things like this, becomes like a syndicate or something

09-26-2014 02:24 PM #4 cmdeal (Veteran Member)

Originally Posted by supeyrio

In business you always need to be careful.

But in an industry like IM/AM where the barriers to entry are very low and you have companies run out of hard-to-reach and low regulation jurisdictions like BVI, Cyprus, Russia, etc., you have to be even more careful than normal.

09-27-2014 02:47 AM #5 qureyoon (AMC Alumnus)

Thanks for the detailed guide and howto!

Originally Posted by caurmen

....
2: Find Bot-Free Placements, Carriers, etc
....
.... we pause after 60 clicks - because we have enough data then, and any more is wasted.

This may sound newbish, but how do you determine the number 60? Any formula to follow?

Thanks again!

09-27-2014 09:25 AM #6 Mr Green (Administrator)

Originally Posted by supeyrio

The publishers are the ones being sneaky, they will do whatever they can to make money from fraudulent traffic. Even though the ad networks know about it it's hard to keep on top of it...just like affiliates trying to sneak in cloaked campaigns.

09-27-2014 04:45 PM #7 leonidas32 (Member)

Great article! I was also looking into this for the past few days, and the javascript redirect is essential for finding dumb bots.

Revenuehits has a list of sources you can blacklist:
Ezanga
Adometry's Known Offenders
Chameleon (on all)
Adsimilate
Adometry's Colo

I thought this was interesting.

DistilNetwork has a paid approach of detecting bot traffic through a more sophisticated method:
http://www.distilnetworks.com/buildi...ct-block-bots/

Botscout is a database of known bots, but I don't know how to implement this code for simple redirects:
botscout.com/code.htm

Edit: Found this little article
http://www.webmasterworld.com/search...rs/4619880.htm

"Well ladies and gents, the playing field has completely changed and is topsy turvy these days as not only are bot using JavaScript the crawlers are actually WRITTEN in JavaScript!

They may also be taking screen shots but that's just icing on the crawling cake.

The technology making all this happen is called Node.js [nodejs.org...] which is a very powerful platform built on Chrome's JavaScript runtime. "

09-27-2014 05:20 PM #8 rafael (Member)

I ran some popunder ads on an ad network like 6 months ago. I stopped those campaigns ages ago and I'm still getting hits every single day. They even click through to the offer, so I get like 20 clicks per day to the offer.

When I check Voluum for the ISP, it's always a hosting facility like A1colo.com or something like that.

It sucks because it throws off my stats, and the offer owner might think I'm sending them fake traffic. But I never thought that I was actually paying for this traffic.

09-28-2014 03:01 PM #9 caurmen (Administrator)

@queyroon - 60's a solid number to establish reasonable statistical certainty within the kind of clickthrough ranges we're expecting. I established it by stabbing the "show statistically predicted values" button on my tracker and looking at the results I had so far

@leonidas32 - INTERESTING.. There are a number of baseline libraries that anyone using Node for this type of task is likely to use, and there's every chance that they have some detectable signatures. If that's the case, said more advanced bots might be somewhat less hidden than they think, at least once I've written the detection tools...

09-28-2014 10:48 PM #10 anarchy (Member)

I was doing some testing and I can't seem to get it to record clicks on bit.ly
If I take out the setTimeout, bit.ly records clicks just fine, but when I add it stops working
Here's the code I'm currently using:
<head>
<script>
function changeImage() {
document.getElementById('myimage').src = "bitly link";
}
onload = setTimeout(changeImage, 300);
</script>
</head>
<body>
<img src="broken" id="myimage">
</body>

If you see a mistake, let me know. I appreciate it.

09-28-2014 10:54 PM #11 zeno (Administrator)

Remove the onload line.

Then change your body tag to <body onload = setTimeout(changeImage, 300);>

You define the changeimage() function in the head before page load. Then, you tell the browser to run this function when the body loads.

09-29-2014 02:29 AM #12 anarchy (Member)

cool cool, thanks zeno, appreciate it

09-29-2014 11:01 AM #13 tomsko (Member)

Thanks @caurmen
Seems like I'm getting ton of bot traffic so I set up everything now, waiting to get my test campaign approved.

09-29-2014 11:58 AM #14 leonidas32 (Member)

On a side note, I got somekind of spyware on my new android, and now it keeps popping someones lander.

Unfortunately for that advertiser, they are paying for CPVs.

Also, the page is loading completely, so it's definitely not being blocked by a simply javascript redirect.

09-29-2014 03:31 PM #15 caurmen (Administrator)

Interesting. If you happen to figure out what the spyware was, I have a sandboxed virtual Android machine with its name on it...

10-13-2014 11:36 AM #16 h0mp (Member)

A while ago i got quite a lot of bot hits from app traffic. They went straight to my lander including parameters given by my server on the first visit.

Around 1000 hits daily to this day.. doesnt cost me anything though. I think its some kind of malware.

10-13-2014 08:23 PM #17 grikis (Member)

I see you have detected a lot of bot traffic using the javascript method.

Traffic source I am using have hundreds of bot-like targets, but they all run/execute my javascript tracking code (script passes actual click on LP to my tracker). Clicks match 90% of time.

So to me, it seems like these bots are smarter then that. They might be using real browsers to cover their asses.

10-13-2014 11:04 PM #18 jennatalia (AMC Alumnus)

javascript is one way to catch a bot.

Another is IP Address, especially if it points to a datacenter.

I would also look into buying screen recording software, or something to capture mouse movement on the page. Filter out places with little to no mouse movement. Those tend to be bots.

11-05-2014 10:19 PM #19 mykeyfocus (Member)

How about only redirecting on a mobile "touch" event. Which only a real visitor would trigger? Some information available here http://www.javascriptkit.com/javatut...chevents.shtml also the touch event looks interesting, we can get the coordinates... http://www.javascriptkit.com/javatut...shtml#eventobj

Just an idea!

11-06-2014 01:50 AM #20 ckzhou (Member)

Originally Posted by caurmen

2: Find Bot-Free Placements, Carriers, etc

This one can't easily be run on a live campaign. It's more of a prospecting approach on a new traffic source or targeting to see which targets are just worth cutting from the outset. You can still run an offer with it, but it has to be direct-linked.

Copy the code below (Code Block 4) into a new text file. Replace "YOUR_TRACKER_CTA_LINK" with your tracker's CTA link as you'd usually put it in a landing page.
Save that as "prospect.html" or something similar, and upload it to your landing page hosting.
Set up a campaign on the traffic source you wish to test, using whatever offer you like, and using "prospect.html" as the only landing page.
Set the campaign live, and watch your stats like a hawk. As soon as any placement hits 60 clicks, pause it. Keep doing this until you either run out of traffic or have tested a good 20+ placements.

Sorry just to check can i still do a bot test without a landing page and web hosting?

11-06-2014 05:13 AM #21 zeno (Administrator)

No, you will need a landing page. It could potentially be on a CDN though.

The bot testing relies on functionality that can only happen in a browser that has loaded a page - direct-linking and simple redirection without landers won't be able to differentiate bots from humans as there is no behavioural input.

12-14-2014 06:56 AM #22 zeno (Administrator)

It depends... what traffic source?

12-15-2014 08:39 AM #23 avn_0903 (Member)

Pardon me if my question is too noobish, but do you place the <img src="broken" id="myimage"> after the </html> or the </body> at the bottom of the page?

12-15-2014 02:52 PM #24 zeno (Administrator)

You would put it somewhere in between the <body> </body> tags, otherwise it won't get rendered by a browser.

12-15-2014 05:36 PM #25 avn_0903 (Member)

Thanks for the clarification Zeno

12-15-2014 06:26 PM #26 maweniaran (Member)

Thanks Caurmen, fantastic study!
I always wandered how many bots can be there, but never found a way to test it.
So after your tutorial my curiosity put me to action immediately. And here are my results

Method #1 with live LP:
281 clicks counted, 1001 clicks altogether. Around 72% of bot traffic? Seems quite a lot. Maybe the code in the LP could be wrong (I'm not a programmer). Hmmm, any ideas?

Method #2 using redirects - same traffic source&settings, worth $20 traffic
Overall CTR is 75%: Overall 25% of bot traffic? (Looks more believable than before)
Divided results into 5 groups according to CTR: 41% of sites has less than 80% humans? Damn!

Questions:
1) would you eliminate placements with 80% of human traffic or less?
2) why is so big difference between methods #1 and #2? (wrong code on the LP?, any other idea?)
3) there are sites with for example 10 visits, but 32 Clicks? (CTR = 320%). What does it mean?
4) Could the results be really true? Do you see it similar?

Would be glad for any comments and ideas regarding my testing

12-15-2014 06:39 PM #27 panicore (Member)

Originally Posted by maweniaran

1) would you eliminate placements with 80% of human traffic or less?

I have placements with 80% bot traffic that have decent ROI, why would you kill it?
Only kill spots where you know it's 100% bot traffic.

12-16-2014 05:29 AM #28 avn_0903 (Member)

I'm not sure what I did wrong but I didn't get any clicks to my bitly link and I know not all of my traffic is bot. What I think happened was that my code was wrong somehow and the bot detection pixel could not be loaded. Attached are the screenshot of my lander with the bot detection pixel not loaded (1). When I right click on the pixel and click view image, I get (2).

Can anyone tell me what happened?

12-16-2014 08:07 PM #29 zeno (Administrator)

Paste your entire page code (you can remove all the stuff inside the body that is your lander content).

12-24-2014 02:02 PM #30 jennatalia (AMC Alumnus)

I set up my bot detecting lander after some suspicious changes in conversions from my traffic source.

I split test bot landing page against direct linking, and bot landing page has 50% less conversions than direct linking.

I'm currently running a test with multiple landing pages with progressively lower redirect times (50, 100, 150, 200, 250, 300) to assess whether or not the conversions are affected. Is there a reason 300ms was picked?

edit: It looks like the onload event doesn't need to have a delay.

http://www.w3schools.com/jsref/event_onload.asp

onload is most often used within the <body> element to execute a script once a web page has completely loaded all content (including images, script files, CSS files, etc.).

11-20-2015 04:16 AM #31 Mr Payne (Member)

Originally Posted by caurmen

The above techniques will work on mobile pop traffic as well - as will other simple techniques like looking at ISP names for hosting companies, or hiding a link from sight on your lander and watching for placements that manage to click through it anyway.

Ah ok, thanks. I thought I read earlier in this thread that you mentioned these methods aren't adequate for pop traffic.

But I'll setup the hiding link method. Thank you!

11-20-2015 07:25 AM #32 cbrughmans (Member)

To cut bot traffic and all other fraudulent traffic, i'd suggest http://forensiq.com

Its quite expensive, but it works great! Filters out everything all the traffic you really don't want to have on your campaigns. I'd say their accuracy to filter out any bot/fraud/invalid traffic is around 98-99%

In addition to Forensiq, I'd also suggest to implement http://theparkingplace.com on all your campaigns. The parking place monetizes all traffic that does not comply with campaign requirements but is still valid/non- fraud traffic e.g. traffic coming from a different country, users with IP not permited on campaign, users on a different carrier (on mobile), etc. Really good eCPM!

11-20-2015 07:10 PM #33 bobliu (Member)

Originally Posted by cbrughmans

What success % are people seeing with services like this? are traffic sources refunding you? (which ones?)

02-10-2016 02:25 AM #34 taewoo (Member)

Originally Posted by caurmen

Called "honeypot" links.

So I made some advances on this in past couple of months... This is s cat & mouse game that never ever freakin' ends but it looks it's quite possible to catch a HUGE chunk of them.

I've been building something quietly.

Some tips:

=> look at ISPs (as caurmen suggested) - no human ever comes from "Digital Ocean" or "Amazon Technologies" - https://www.iplocation.net/ gives you ISP by IP

=> IPs that come from class C subnet blocks of known proxies, VPNs, including those bastard Hola Networks (here's the story)

=> behavior of visitor, including honeypot, scroll, where they click, how often they visit, etc.

INteresting note: Remember, other than scrapers... people using bots to defraud advertisers are, well, people. Look for properties (by referer URL and/or app IDs)... do some research into what other properties they own. The odds are, they're using bot traffic on those as well...

02-11-2016 05:52 PM #35 taewoo (Member)

On random note, apparently AppNExus has bot filtering

After filtering for fraud, AppNexus transactions fell by 65 percent

(link):

Curious.. has anyone buying mDSP traffic seen improvements? (i.e. less bots)

02-12-2016 05:12 PM #36 Glispa Team (Member)

@Caumen, thanks for bringing attention to this topic. Bots, fraud, compliance and prevention have been hot topics at glispa for a while, especially with the recent and upcoming improvements that many third party tracking platforms have made and will make. There is indeed no easy solution and we would recommend referring to this article by the CTO of Adjust, who gave very useful insights into non-user initiated traffic and how they are being combated against.

Home > Tracking Campaigns > Tutorials, Tips and Guides