Robots.txt best practices & tips
-
Hey,
I was wondering if someone could give me some advice on whether I should block the robots.txt file from the average user (not from googlebot, yandex, etc)?
If so, how would I go about doing this? With .htaccess I'm guessing - but not an expert.
What can people do with the information in the file? Maybe someone can give me some "best practices"? (I have a wordpress based website)
Thanks in advance!
-
Asking about the ideal configuration for a robots.txt file for WordPress is opening a huge can of worms There's plenty of discussion and disagreement about exactly what's best, but a lot of it depends on the actual configuration and goals of your own website. That's too long a discussion to get into here, but below is what I can recommend as a pretty basic, failsafe version that should work for most sites:
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/Sitemap: http://www.yoursite.com/sitemap.xml
I always prefer to explicitly declare the location of my site map, even if it's in the default location.
There are other directives you can include, but they depend more on how you have handled other aspects of your website - e.g. trackbacks, comments and search results pages as well as feeds. This is where the list can get grey, as there are multiple ways to accomplish this, depending how your site is optimised, but here's a representative example.
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category//
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /?
Disallow: /?Sorry I can't be more specific on the above example, but it's where things really come down to how you're managing your specific site, and are a much bigger discussion. A web search for "best WordPress robots.txt file" will certainly show you the range of opinions on this.
The key thing to remember with a robots.txt file is that it does not cause blocked URLs to be removed from the index, it only stops the crawlers from traversing those pages. It's designed to help the crawlers spend their time on the pages that you have declared useful, instead of wasting their time on pages that are more administrative in nature. A crawler has a limited amount of time to spend on your site, and you want it to spend that time looking at the valuable pages, not the backend.
Paul
-
Thanks for the detailed answer Paul!
Do you think there is anything I should block for a wordpress website? I blocked /admin.
-
There is really no reason to block the robots.txt file from human users, Jazy. They'll never see it unless they actively go looking for it, and even if they do, it's just directives for where you want the search crawlers to go and where you want them to stay away from.
The only thing a human user will learn from this, is what sections of your site you consider to be nonessential to a search crawler. Even without the robots file, if they were really interested in this information, they could acquire it in other ways.
If you're trying to use your robots.txt file to block information about pages on your website you want to keep private or you don't want anyone to know about, doing it in the robots.txt file is the wrong place anyway. (That's done in .htaccess, which should be blocked from human readers.)
There's enough complexity to managing a website, there's no reason to add more by trying to block your robots file from human users.
Hope that helps?
Paul
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Role of Robots.txt and Search Console parameters settings
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
Technical SEO | | LivDetrick0 -
Tips to promote hotels site ?
I made site to book hotels: http://bilodeals.com/ any tips to promote him and get huge traffic ?
Technical SEO | | coinvideos80 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Best Implementation of a Title Tag
If My Targeted keyword are: Mussoorie Hotels Hotels in Mussoorie Mussoorie Resorts Resorts in Mussoorie What of the below 3 will be the best Title Tag After Panda and Penguine ? Hotels and Resorts in Mussoorie Mussoorie Hotels | Mussoorie Resorts | Luxury Budget & Economical Accommodation in Mussoorie Mussoorie Hotels, Mussoorie Resorts, Hotels in Mussoorie, Resorts in Musoorie please suggest!
Technical SEO | | WildHawk0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
Best Practice to Remove a Blog
Note: Re-posting since I accidentally marked as answered Hi, I have a blog that has thousands of URL, the blog is a part of my site. I would like to obsolete the blog, I think the best choices are 1. 404 Them: Problem is a large number of 404's. I know this is Ok, but makes me hesitant. 2. meta tag no follow no index. This would be great, but the question is they are already indexed. Thoughts? Thanks PS A 301 redirect to the main page would be flagged as a soft 404
Technical SEO | | Bucky0 -
Search engines have been blocked by robots.txt., how do I find and fix it?
My client site royaloakshomesfl.com is coming up in my dashboard as having Search engines have been blocked by robots.txt, only I have no idea where to find it and fix the problem. Please help! I do have access to webmaster tools and this site is a WP site, if that helps.
Technical SEO | | LeslieVS0 -
Subdomains & SEO
Exact match domains are great for ranking but what about domains which contain just half of the full phrase being targeted? eg. If you owned the domain rentals.co.uk but wanted to target the search term "car rentals" Regarding backlinks, would it be best to link back to your rentals.co.uk homepage (using anchor text "car rentals") or to one of the following: a) www.rentals.co.uk/car-rentals b) car.rentals.co.uk AND 301 redirect to www.rentals.co.uk c) car.rentals.co.uk AND 301 redirect to www.rentals.co.uk/car-rentals
Technical SEO | | martyc1