Robots.txt Blocking - Best Practices
-
Hi All,
We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line.
We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files?
Thanks!
User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
-
Thanks for taking the time to respond in depth, GreenStone. We appreciate the advice and have passed your response along to the web hosting company (along with a frustrated email) explaining they're not adhering to anyone's best practices. Hopefully this will convince them!
-
Thanks, Dmitrii for your response! From our research we've seen similar recommendations and it helps to have more evidence to back it up. Hopefully these guys will give in a bit!
-
Completely agree, I really wouldn't want to host my stuff with a company that can't figure out what really the best practices are ;-). This is very well layed out why you shouldn't want to set up your robots.txt like it is right now.
-
In general, I definitely wouldn't recommend the way the web-provider is handling this.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
- It makes a lot more sense to just allow crawlers full access, and then add crawl delays for non google crawlers, in addition to disallowing those specific sub-folders: Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp.
- Googlebot Disallow: Crawl-delay: 5, does not do you any good, as google does not obey these commands. Only Search Console can control this.
- You can test what is visible to googlebot within search console's "robots" subsection, in order to verify what they can access.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
-
Here is another video from Matt - https://www.youtube.com/watch?v=I2giR-WKUfY
Lots of good points there too.
-
Hi.
Super weird client - that's for sure.
User-agent: * Disallow: /
Every bot will be blocked off! how in the world are they ranking?
watch that video, there are good ideas of bot and crawlers controlling. As well as you can consider that as best practices. And yes, what they have now is ridiculous.
https://mza.seotoolninja.com/community/q/should-we-use-google-s-crawl-delay-setting
Here is a q/a about crawler delays. As far as I know Google ignores delays anyway, plus there is nothing good about it anyway.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Whats the best practice for acquisition?
Hi, My company have just bought out a competitor. We wan't to dissolve their website and if possible steal some of their link juice. The site hasn't got any spammy links or 404's so i'm not worried in that department. What I am not sure about is which of the following is best practice? a. Redirect every single page (even pages like /?checkout) to a relevant page on our website. b. Only redirect important pages, category pages, contact pages etc and leave the other pages to 404? c. Redirect the important pages to a relevant URL and redirect the less important pages to our homepage. d. Redirect the entire domain to our home page (i assume this isn't a good idea) e. Don't redirect any of the pages just delete the site.
Intermediate & Advanced SEO | | DannyHoodless0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Best strategy for duplicate content?
Hi everyone, We have a site where all product pages have more or less similar text (same printing techniques, etc.) The main differences are prices and images, text is highly similar. We have around 150 products in every language. Moz's algorithm tells me to do something about duplicate content, but I don't really know what we could do, since the descriptions can't be changed to be very different. We essentially have paper bags in different colors and and from different materials.
Intermediate & Advanced SEO | | JaanMSonberg0 -
Why are these results being showed as blocked by robots.txt?
If you perform this search, you'll see all m. results are blocked by robots.txt: http://goo.gl/PRrlI, but when I reviewed the robots.txt file: http://goo.gl/Hly28, I didn't see anything specifying to block crawlers from these pages. Any ideas why these are showing as blocked?
Intermediate & Advanced SEO | | nicole.healthline0 -
List of Best SEO forums
Could I please get some input on a list of the best SEO forums out there besides SEOmoz? (Both mainstream and non mainstream)
Intermediate & Advanced SEO | | Luia0 -
What is the best canonical url to use for a product page?
I just helped a client redesign and launch a new website for their organic skin care company (www.hylunia.com). The site is built in Magento which by default creates MANY urls for each product. Which of these two do you think would be the best to use as the canonical version? http://www.hylunia.com/pure-hyaluronic-acid-solution
Intermediate & Advanced SEO | | danielmoss
or http://www.hylunia.com/products/face-care/facial-moisturizers/pure-hyaluronic-acid-solution ? I'm leaning on the latter, because it makes sense to me to have the breadcrumbs match the url string, and also it seems having more keywords in the url would help. However, it's obviously a very long url, and there might be some benefits to using the shorter version that I'm not aware of. Thanks in advance for sharing your thoughts. Best, Daniel0 -
How To Best Close An eCommerce Site?
We're closing down one of our eCommerce sites. What is the best approach to do this? The site has a modest link profile (a young site). It does have a run of site link to the parent site. It also has a couple hundred email subscribers and established accounts. Is there a gradual way to do this? How do I treat the subscribers and account holders? The impact won't be great, but I want to minimize collateral damage as much as possible. Thanks.
Intermediate & Advanced SEO | | AWCthreads0 -
XML Sitemap instruction in robots.txt = Worth doing?
Hi fellow SEO's, Just a quick one, I was reading a few guides on Bing Webmaster tools and found that you can use the robots.txt file to point crawlers/bots to your XML sitemap (they don't look for it by default). I was just wondering if it would be worth creating a robots.txt file purely for the purpose of pointing bots to the XML sitemap? I've submitted it manually to Google and Bing webmaster tools but I was thinking more for the other bots (I.e. Mozbot, the SEOmoz bot?). Any thoughts would be appreciated! 🙂 Regards, Ash
Intermediate & Advanced SEO | | AshSEO20110