Home > Technical & Creative Skills > Programming, Servers & Scripts

How To: Block Cheeky Website Ripping Tools (13)


11-08-2016 03:33 AM #1 nickpeplow (AMC Alumnus)
How To: Block Cheeky Website Ripping Tools

Getting you landing pages ripped sucks, fortunately the vast majority of affiliates are not particularly technologically savvy and use tools like HTTrack to do their dirty work.

Here is a simple trick to make life slightly harder for them, It's not totally foolproof and it can be worked around by going into the settings... but aint nobody got time for that

Create a robots.txt file in your websites root directory and add the following code


# No thanks Google

User-agent: *
Disallow: /

# No thanks HTTrack etc

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: WebReaper
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: Web Downloader
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: Offline Explorer Pro
Disallow: /

User-agent: HTTrack Website Copier
Disallow: /

User-agent: Offline Commander
Disallow: /

User-agent: Leech
Disallow: /

User-agent: WebSnake
Disallow: /

User-agent: BlackWidow
Disallow: /

User-agent: HTTP Weazel
Disallow: /


11-08-2016 06:18 AM #2 cmdeal (Veteran Member)

Obviously, this should not be for anyone doing SEO ...


11-08-2016 07:40 AM #3 Mr Green (Administrator)

Nice share!


11-08-2016 07:49 AM #4 xkjonz (Member)

Would it block Adplexity too?


11-08-2016 07:51 AM #5 nickpeplow (AMC Alumnus)

Quote Originally Posted by xkjonz View Post
Would it block Adplexity too?
No, it only works for the typical website ripping tools listed above


11-08-2016 07:58 AM #6 Karika (Member)

Thanks for sharing we only used the google ones!


11-08-2016 01:56 PM #7 osmiumman (Member)

Quote Originally Posted by nickpeplow View Post
User-agent: *
Disallow: /
Wouldn't it be enough to just use those 2 lines, as they block all user agents?


11-08-2016 04:09 PM #8 datle888 (Member)

Quote Originally Posted by osmiumman View Post
Wouldn't it be enough to just use those 2 lines, as they block all user agents?
Yeah, it would be enough. Keep in mind that robots.txt is purely advisory. It depends on whether the crawler you're using respects the exclusions in the document. Google and Bing do, lots of scraping tools do not.


11-08-2016 05:05 PM #9 cmdeal (Veteran Member)

Quote Originally Posted by osmiumman View Post
Wouldn't it be enough to just use those 2 lines, as they block all user agents?
Only if life was that simple


11-08-2016 07:52 PM #10 fbqueen (Senior Member)

Nice share Nick! Thanks!


11-08-2016 08:38 PM #11 danielt (Member)

The above does look good, however it only prevents LPs not on a CDN. Requests to CDN LPs stop at the CDN level meaning they are crawlable / downloadable.
Depending on your CDN choice, you may or may not add extra headers to the requests in order to break certain types of requests or in some cases they can either
copy your robots.txt or they'll let you add your own ( both are good options to have - not enabled by default on maxCDN for instance). Also CDNs and DNS providers
have tools that can help this BUUUUUT be aware: clickloss will increase considerably.

Look at it as a funnel like in AM: the most steps, the more traffic is lost.


11-15-2016 06:04 AM #12 affpayinggao (Veteran Member)

Great share, thanks.


11-15-2016 06:06 AM #13 nickpeplow (AMC Alumnus)

@danielt if you're pushing the whole site out over a CDN then the original robots.txt will be present. Many offer a custom override, so you can update this if you like


Home > Technical & Creative Skills > Programming, Servers & Scripts