SEO Best Practices regarding Robots.txt disallow

jamiegriz

I cannot find hard and fast direction about the following issue:

It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?

I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?

Thank you!!

mememax

mmm it depends.

it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.

Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?

While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.

A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.

Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.

I hope this made sense

jamiegriz

Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?

mememax

That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).

They do not know how important those pages may be for you so you are the one that needs to assess what to do net.

Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.

About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.

TheKatzMeow

Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

SEO Best Practices regarding Robots.txt disallow

Browse Questions

Explore more categories

Related Questions

Best practice to 301 NON-WWW pages?

Faceted Navigation URLs Best Practices

Where do we focus from an SEO perspective?

Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?

International SEO

Volusion SEO

Should comments and feeds be disallowed in robots.txt?

Need to know best practices of Search Engine Optimization 2013