Block bad crawlers
-
Hi! how are you?
I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well.
My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic?
If so, do you know how to block them using robots.txt?
Thanks for the help!
Best wishes,
Ariel
-
Hey Ariel,
Here's a couple lists of bots that some people are blocking - you should probably review your server data to see which bots are visiting you that you want to block:
In addition to the moz resource Chris referenced, here are a couple more pages that might be useful for you:
- http://stackoverflow.com/questions/10793906/how-to-allow-known-web-crawlers-and-block-spammers-and-harmful-robots-from-scann
- http://www.distilled.net/u/robots-txt/
Good luck!
-
Chris gives a good answer, but is it really a problem, bandwidth is very cheap these days, in fact here in Australia most accounts are unlimited,
I Host with Microsoft Azure and bandwidth is very cheap.
-
Ariel, you could start with the list shown here and tailor it to fit your needs if you're having problems with others: http://www.webmasterworld.com/search_engine_spiders/4579553.htm. There's info there on using robots.txt to block them and you should also read this for info on using robots.txt file: Robots.txt and Meta Robots - SEO Best Practices - Moz
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is Still Blocking Pages Unblocked 1 Month ago in Robots
I manage a large site over 200K indexed pages. We recently added a new vertical to the site that was 20K pages. We initially blocked the pages using Robots.txt while we were developing/testing. We unblocked the pages 1 month ago. The pages are still not indexed at this point. 1 page will show up in the index with an omitted results link. Upon clicking the link you can see the remaining un-indexed pages. Looking for some suggestions. Thanks.
Technical SEO | | Tyler1230 -
Are bad links the reason for not ranking?
Hello Moz community. I'm looking here for some input from the experts on what could be wrong with a site I'm working on. The site is in Spanish, but I'm sure you'll get the idea. We want to rank the site first page on Google Mexico (www.google.com.mx) for the keyword "refacciones Audi" and some other brands (refacciones = replacement parts would probably be a good translation, just FYI). Now, our page hasn't been completely optimized, so in my mind it's OK not to be on first page yet. However, our main competitor is ranking first page for all the keywords we want to rank for, but when you check their site, you'll find there is hardly any content, no keywords are being used in their content, all pages have the exact same title and meta description, their catalog is in a completely different domain. In short, no SEO whatsoever. Looking at Moz data, our site has a DA of 26, while our competitor's has a 10. They have no external backlinks at all, while we have a few hundred. This leaves me scratching my head: how can a completely non-optimized site outrank us? I decided to check our backlink profile, and a previous SEO agency seems to have built MANY fake blogs with lots of backlinks with rich anchor text. Quite a big percentage of our backlinks are of this kind, so this is the only thing I can think can be affecting our ranking. Will disavowing be our solution? If you'd like to check, our site is: www.refaccionariaalemana.com.mx Our competitors' is: www.saferefacciones.com ANY help will be extremely appreciated as I feel a bit lost. Thanks!
Technical SEO | | EduardoRuiz1 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
My Alexa ranking dropped after a 301 redirect is that bad?
I had all of my non www pages redirect to the www versions. My alexa ranking dropped and keeps dropping after I did this. I'm guessing its because its tracking the non www version. Does anyone know if this is correct and should I worry?
Technical SEO | | CandleCam0 -
Can hotlinking images from multiple sites be bad for SEO?
Hi, There's a very similar question already being discussed here, but it deals with hotlinking from a single site that is owned by the same person. I'm interested whether hotlinking images from multiple sites can be bad for SEO. The issue is that one of our bloggers has been hotlinking all the images he uses, sometimes there are 3 or 4 images per blog from different domains. We know that hotlinking is frowned upon, but can it affect us in the SERPs? Thanks, James
Technical SEO | | OptiBacUK0 -
Blocked by robots
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags his former seo told him this: "We've activated following settings: Use noindex for Categories Use noindex for Archives Use noindex for Tag Archives to reduce keyword stuffing & duplicate post tags
Technical SEO | | Ezpro9
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. " is this guy correct? what would be the problem with indexing these? am i correct in thinking they should be indexed? thanks0 -
Seeing non-www in yahoo results - good or bad?
My site ranks for both domain versions but more non-www than www - Should I make it one or the other? How do I tell Yahoo to just choose one? Ehh?
Technical SEO | | DavidS-2820610 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0