{"id":2903,"date":"2020-01-27T10:05:57","date_gmt":"2020-01-27T15:05:57","guid":{"rendered":"https:\/\/easy-admin.ca\/?p=2903"},"modified":"2020-01-28T21:45:09","modified_gmt":"2020-01-29T02:45:09","slug":"prevent-bad-crawler-bots-to-overload-your-server","status":"publish","type":"post","link":"https:\/\/easy-admin.ca\/index.php\/2020\/01\/27\/prevent-bad-crawler-bots-to-overload-your-server\/","title":{"rendered":"Prevent Bad Crawler Bots to overload your server!"},"content":{"rendered":"<p>Good day, we had some issues over the weekend at LiquidWeb! The problem was a large volume of crawling on some specific websites. Here is a good practice to prevent this from happening.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\n<strong>robot.txt (Block only those bots)<\/strong><br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\nuser-agent: AhrefsBot<br \/>\nuser-agent: MJ12bot<br \/>\n<strong>user-agent: Semrushbot<\/strong><br \/>\ndisallow: \/<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\n<strong>robot.txt (Block all except google)<\/strong><br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\nUser-Agent: Googlebot<br \/>\nAllow: \/<br \/>\nUser-Agent: *<br \/>\nDisallow: \/<\/p>\n<p>This input will block access to your website to all bots apart of Google.<\/p>\n<p><em><strong>In Theory.<\/strong><\/em> Many bots don&#8217;t respect it so it is good idea to block them through .htaccess file.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\n<strong>.htaccess<\/strong><br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\nRewriteEngine On<br \/>\nRewriteBase \/<\/p>\n<p>RewriteCond %{HTTP_USER_AGENT} .*Twice.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Yand.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Yahoo.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Voil.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*libw.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Java.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Sogou.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*psbot.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Exabot.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*boitho.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*ajSitemap.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*DBLBot.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*MJ1.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*ask.* [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.* [OR]<br \/>\n<strong>RewriteCond %{HTTP_USER_AGENT} .*Semrush.*<\/strong><br \/>\nRewriteRule ^(.*)$ http:\/\/example.com\/ [L,R=301]<\/p>\n<p>Order Allow,Deny<br \/>\nAllow from all<br \/>\n<strong>Deny from 104.16.0.0\/12<\/strong><br \/>\nDeny from 110.0.0.0\/8<br \/>\nDeny from 111.0.0.0\/8<br \/>\nDeny from 112.0.0.0\/5<br \/>\nDeny from 120.0.0.0\/6<br \/>\nDeny from 124.0.0.0\/8<br \/>\nDeny from 125.0.0.0\/8<br \/>\nDeny from 147.0.0.0\/8<br \/>\nDeny from 169.208.0.0<br \/>\nDeny from 175.0.0.0\/8<br \/>\nDeny from 180.0.0.0\/8<br \/>\nDeny from 182.0.0.0\/8<br \/>\nDeny from 183.0.0.0\/8<br \/>\nDeny from 202.0.0.0\/8<br \/>\nDeny from 203.0.0.0\/8<br \/>\nDeny from 210.0.0.0\/8<br \/>\nDeny from 211.0.0.0\/8<br \/>\nDeny from 218.0.0.0\/8<br \/>\nDeny from 219.0.0.0\/8<br \/>\nDeny from 220.0.0.0\/8<br \/>\nDeny from 221.0.0.0\/8<br \/>\nDeny from 222.0.0.0\/8<\/p>\n<p># make your own list <em><strong>\ud83d\ude09 PlaySafe!<\/strong><\/em><\/p>\n<p><strong>Note:<\/strong> RewriteCond ^(.*)$ ,&#8230;. will forward all crawler&#8217;s to http:\/\/example.com [L,R=301]<\/p>\n<p><strong>Enjoy!<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Good day, we had some issues over the weekend at LiquidWeb! The problem was a large volume of crawling on some specific websites. Here is a good practice to prevent this from happening. &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- robot.txt (Block only those bots) &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- user-agent: AhrefsBot user-agent: MJ12bot user-agent: Semrushbot disallow: \/ &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- robot.txt (Block all except google) &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- &hellip; <a href=\"https:\/\/easy-admin.ca\/index.php\/2020\/01\/27\/prevent-bad-crawler-bots-to-overload-your-server\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Prevent Bad Crawler Bots to overload your server!<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":2905,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2903","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/posts\/2903","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/comments?post=2903"}],"version-history":[{"count":0,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/posts\/2903\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/media\/2905"}],"wp:attachment":[{"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/media?parent=2903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/categories?post=2903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/easy-admin.ca\/index.php\/wp-json\/wp\/v2\/tags?post=2903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}