Block bad crawlers
-
Hi! how are you?
I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well.
My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic?
If so, do you know how to block them using robots.txt?
Thanks for the help!
Best wishes,
Ariel
-
Hey Ariel,
Here's a couple lists of bots that some people are blocking - you should probably review your server data to see which bots are visiting you that you want to block:
In addition to the moz resource Chris referenced, here are a couple more pages that might be useful for you:
- http://stackoverflow.com/questions/10793906/how-to-allow-known-web-crawlers-and-block-spammers-and-harmful-robots-from-scann
- http://www.distilled.net/u/robots-txt/
Good luck!
-
Chris gives a good answer, but is it really a problem, bandwidth is very cheap these days, in fact here in Australia most accounts are unlimited,
I Host with Microsoft Azure and bandwidth is very cheap.
-
Ariel, you could start with the list shown here and tailor it to fit your needs if you're having problems with others: http://www.webmasterworld.com/search_engine_spiders/4579553.htm. There's info there on using robots.txt to block them and you should also read this for info on using robots.txt file: Robots.txt and Meta Robots - SEO Best Practices - Moz
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I block incoming links from a bad web site ?
Hello all, We got a new client recently who had a warning from Google Webmasters tools for manual soft penalty. I did a lot of search and I found out one particular site that sounds roughly 100k links to one page and has been potentialy a high risk site. I wish to block those links from coming in to my site but their webmaster is nowhere to be seen and I do not want to use the disavow tool. Is there a way I can use code to our htaccess file or any other method? Would appreciate anyone's immediate response. Kind Regards
Technical SEO | | artdivision0 -
I need help compiling solid documentation and data (if possible) that having tons of orphaned pages is bad for SEO - Can you help?
I spent an hour this afternoon trying to convince my CEO that having thousands of orphaned pages is bad for SEO. His argument was "If they aren't indexed, then I don't see how it can be a problem." Despite my best efforts to convince him that thousands of them ARE indexed, he simply said "Unless you can prove it's bad and prove what benefit the site would get out of cleaning them up, I don't see it as a priority." So, I am turning to all you brilliant folks here in Q & A and asking for help...and some words of encouragement would be nice today too 🙂 Dana
Technical SEO | | danatanseo0 -
Website credits for designers - good or bad
Hi My core service is web design and development. I often place a credit on my clients websites pointing them back to my web design or web development pages. Is this a wise practice with penguin and panda updates? Would this also pull my ranking down?
Technical SEO | | Cocoonfxmedia0 -
Are pagination a bad thing for seo
hi i am just checking my errors on my site and it is telling me about duplicate pagination results, so i am just wondering if pagination is bad for seo for example http://www.in2town.co.uk/benidorm/benidorm-news/Page-2 i also have page 3 and page 4. should i stop my site from having this to help with seo
Technical SEO | | ClaireH-1848860 -
Is Go Daddy a bad domain?
I heard today that Go Daddy is not the besting hosting domain for websites...it isn't crawled well by websites. Is this true? What is the best hosting domain?
Technical SEO | | CapitolShine0 -
Prevent Google Web Preview bot from seeing pop-up,m bad for SEO?
Hey guys, On our website, we have a lightbox pop-up showing an external page with an e-mail newsletter signup form. It it shown to some 5% of our visitors and works with a cookie to prevent the popup from showing at each visit. Recently, I saw the popup displayed in the Google SERP instant preview, for every page. The preview looks messed up. We could prevent the popup to be shown to the google web preview bot by blocking this user agent. Question is: Will it hurt our SEO? Because we show the web preview bot (not the crawl bot) something different than what a visitor may see BQrS7 BQrS7.jpg
Technical SEO | | Webprint0 -
Is it a bad that my site has the same title and description for directory listings?
I manually listed my site in a few hundred free directories, two paid directores (Joe ant $40, and dirmania $12), and 50 directories that require a reciprocal link ( I paid for a cheap service that gets around having to do the reciprocal). I made the big mistake of having the title and the description for these as the same or very close to the same...is this a huge problem? Should I have my site removed from the free directories or just let it go? I've since stopped focusing on all the directories, and considering saving up to get in Yahoo directory. Working now on getting legit and relevant links from .edu sites.
Technical SEO | | eugenecomputergeeks0 -
How to Block Urls with specific components from Googlebot
Hello, I have around 100,000 Error pages showing in Google Webmaster Tools. I want to block specific components like com_fireboard, com_seyret,com_profiler etc. Few examples: http://www.toycollector.com/videos/generatersslinks/index.php?option=com_fireboard&Itemid=824&func=view&catid=123&id=16494 http://www.toycollector.com/index.php?option=com_content&view=article&id=6932:tomica-limited-nissan-skyline-r34--nissan-skyline-gt-r-r34-vspec&catid=231&Itemid=634 I tried blocking using robots.txt. Just used this Disallow: /com_fireboard/
Technical SEO | | TheMartingale
Disallow: /com_seyret/ But its not working. Can anyone suggest me to solve this problem. Many Thanks Shradda0