Robots.txt Blocking - Best Practices
-
Hi All,
We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line.
We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files?
Thanks!
User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
-
Thanks for taking the time to respond in depth, GreenStone. We appreciate the advice and have passed your response along to the web hosting company (along with a frustrated email) explaining they're not adhering to anyone's best practices. Hopefully this will convince them!
-
Thanks, Dmitrii for your response! From our research we've seen similar recommendations and it helps to have more evidence to back it up. Hopefully these guys will give in a bit!
-
Completely agree, I really wouldn't want to host my stuff with a company that can't figure out what really the best practices are ;-). This is very well layed out why you shouldn't want to set up your robots.txt like it is right now.
-
In general, I definitely wouldn't recommend the way the web-provider is handling this.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
- It makes a lot more sense to just allow crawlers full access, and then add crawl delays for non google crawlers, in addition to disallowing those specific sub-folders: Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp.
- Googlebot Disallow: Crawl-delay: 5, does not do you any good, as google does not obey these commands. Only Search Console can control this.
- You can test what is visible to googlebot within search console's "robots" subsection, in order to verify what they can access.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
-
Here is another video from Matt - https://www.youtube.com/watch?v=I2giR-WKUfY
Lots of good points there too.
-
Hi.
Super weird client - that's for sure.
User-agent: * Disallow: /
Every bot will be blocked off! how in the world are they ranking?
watch that video, there are good ideas of bot and crawlers controlling. As well as you can consider that as best practices. And yes, what they have now is ridiculous.
https://mza.seotoolninja.com/community/q/should-we-use-google-s-crawl-delay-setting
Here is a q/a about crawler delays. As far as I know Google ignores delays anyway, plus there is nothing good about it anyway.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best method for blocking a subdomain with duplicated content
Hello Moz Community Hoping somebody can assist. We have a subdomain, used by our CMS, which is being indexed by Google.
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/
https://admin.naturalworldsafaris.com/ The page is the same so we can't add a no-index or no-follow.
I have both set up as separate properties in webmaster tools I understand the best method would be to update the robots.txt with a user disallow for the subdomain - but the robots text is only accessible on the main domain. http://www.naturalworldsafaris.com/robots.txt Will this work if we add the subdomain exclusion to this file? It means it won't be accessible on https://admin.naturalworldsafaris.com/robots.txt (where we can't create a file). Therefore won't be seen within that specific webmaster tools property. I've also asked the developer to add a password protection to the subdomain but this does not look possible. What approach would you recommend?0 -
Page A Best for Users, but B Ranks
This is real estate MLS listings related. I have a page "B" with lots of unique content (MLS thumbnails mixed with guide overview writing, pictures etc) which outranks "A" which is a page simply showing MLS thumbnails with map feature included. I am linking from "B" to "A" with anchor "KEYWORD for sale" to indicate to search engines that "A" is the page I want to rank, even though "B" has more unique content. It hasn't worked so far.
Intermediate & Advanced SEO | | khi5
Questions: Should I avoid linking from "B" to "A" as that could impact how well "B" ranks? Should I leave this setup and over time hope search engines will give "A" a chance to rank? Include some unique content on "A" mostly not viewable without clicking "Read more" link? I don't foresee many users will click "Read more" as they are really just looking for the properties for sale and do rarely care about written material when searching for "KEYWORD for sale". Should I "no index, follow" A as there are limited to none unique content and this could enhance chance of ranking better for B? When I write blog posts and it includes "KEYWORD for sale" should I link to "A" (best for users) or link to "B" since that page has more potential to rank really well and still is fairly good for users? Ranking for "B" is not creating a large bounce rate, just that "A" is even better. Thank you,
Kristian0 -
Best link analysis tool before moving to disavow them
I want to use Google disavow tool. I have 2 questions relating it 1. How to use Google Disavow tool? 2. How to use Moz or OSE for deep backlink mining? If Moz not best fit needs alternative for it.
Intermediate & Advanced SEO | | csfarnsworth0 -
Best practices with reoccurring event listings
On our client's events page there are a few reoccurring events that each have their own detail page. I'm trying to figure out what's the best practice for minimising duplicate content. For example, for the Bribie Island Markets that repeat weekly there are 2 (+more) detailed event pages: http://www.ourbribie.com/e/bribie-island-markets/1869/2013-12-07/2013-12-07
Intermediate & Advanced SEO | | michaelp85
http://www.ourbribie.com/e/bribie-island-markets/1869/2013-12-14/2013-12-14 While they both contain duplicated content, they're unique in that they display the specific events date/time. My thinking is that the future events (e.g. 2013-12-14) should have a canonical link to the upcoming/next event (i.e. 2013-12-07). However this would require constantly updating/changing the canonical links. What's the best way to deal with this from a duplicate content prospective? Any better recommendations?0 -
Can we really learn from the best?
Hi All, When I started my site (an eCommerce site) I copied (or tried) a lot of things from the best eCommerce sites I thought were out there. Sites like Zappos, ZALES, Overstock, BlueNile etc. I got hit pretty hard with latest algo changes and I posted my question at Google Webmaster Help forum I received answers from Gurus that we are keyword stuffing etc. (mainly with internal links to product pages but other issues as well). My answer was a link to Zappos and other sites showing that what we do is nothing compared to them. I also showed dozens of SEO "errors" like using H1 tag 10 times per page, not using canonicals and many other issues. The Guru's answer was "LOL" - who am I to compare myself to Zappos. So the question is... Can we take them for example or are they first simply because they are the biggest?
Intermediate & Advanced SEO | | BeytzNet0 -
Our Site's Content on a Third Party Site--Best Practices?
One of our clients wants to use about 200 of our articles on their site, and they're hoping to get some SEO benefit from using this content. I know standard best practices is to canonicalize their pages to our pages, but then they wouldn't get any benefit--since a canonical tag will effectively de-index the content from their site. Our thoughts so far: add a paragraph of original content to our content link to our site as the original source (to help mitigate the risk of our site getting hit by any penalties) What are your thoughts on this? Do you think adding a paragraph of original content will matter much? Do you think our site will be free of penalty since we were the first place to publish the content and there will be a link back to our site? They are really pushing for not using a canonical--so this isn't an option. What would you do?
Intermediate & Advanced SEO | | nicole.healthline1 -
Blocking HTTP 1.0?
One of my clients believes someone is trying to hack their site. We are seeing the requests with a server protocol or HTTP 1.0 so they want to block 1.0 entirely. Will this cause any problems with search engines or regular, non-spamming visitors?
Intermediate & Advanced SEO | | BryanPhelps-BigLeapWeb0