Prevent Bad Crawler Bots to overload your server!

Good day, we had some issues over the weekend at LiquidWeb! The problem was a large volume of crawling on some specific websites. Here is a good practice to prevent this from happening.

—————————————————————-
robot.txt (Block only those bots)
—————————————————————-
user-agent: AhrefsBot
user-agent: MJ12bot
user-agent: Semrushbot
disallow: /

—————————————————————-
robot.txt (Block all except google)
—————————————————————-
User-Agent: Googlebot
Allow: /
User-Agent: *
Disallow: /

This input will block access to your website to all bots apart of Google.

In Theory. Many bots don’t respect it so it is good idea to block them through .htaccess file.

—————————————————————-
.htaccess
—————————————————————-
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} .*Twice.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Yand.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Yahoo.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Voil.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*libw.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Java.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Sogou.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*psbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Exabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*boitho.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*ajSitemap.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*DBLBot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*MJ1.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*ask.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Semrush.*
RewriteRule ^(.*)$ http://example.com/ [L,R=301]

Order Allow,Deny
Allow from all
Deny from 104.16.0.0/12
Deny from 110.0.0.0/8
Deny from 111.0.0.0/8
Deny from 112.0.0.0/5
Deny from 120.0.0.0/6
Deny from 124.0.0.0/8
Deny from 125.0.0.0/8
Deny from 147.0.0.0/8
Deny from 169.208.0.0
Deny from 175.0.0.0/8
Deny from 180.0.0.0/8
Deny from 182.0.0.0/8
Deny from 183.0.0.0/8
Deny from 202.0.0.0/8
Deny from 203.0.0.0/8
Deny from 210.0.0.0/8
Deny from 211.0.0.0/8
Deny from 218.0.0.0/8
Deny from 219.0.0.0/8
Deny from 220.0.0.0/8
Deny from 221.0.0.0/8
Deny from 222.0.0.0/8

# make your own list 😉 PlaySafe!

Note: RewriteCond ^(.*)$ ,…. will forward all crawler’s to http://example.com [L,R=301]

Enjoy!