Robots.txt on refinements
-
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
-
Hi There
In general you probably don't need to do that. Here's how I would normally deal with indexation in WordPress (assuming you're using WordPress);
- Categories - index
- Tags - noindex
- Date archives - noindex
- Author (single author blogs) - noindex
- Author (multi-author) - index
- Subpages - noindex
Basically all these settings are shown in my post here on setting up WordPress: http://moz.com/blog/setup-wordpress-for-seo-success
Yoast is the best plugin to do all this with!
-
One of the most common mistakes I see in SEO... There's nothing about the robots.txt that keeps pages from being indexed. In fact, just the opposite. If you have existing pages to which you've added no-index, but you also block them with robots.txt, then the search crawler will never see them to pick up the no-index and therefore won't know it's supposed to remove them. So they would still count against you as thin content even though they're not being crawled. NOT the result you're looking for.
If you can no-index them, great. If not, at least use canonical tags to point them to the primary version of the category page. (Remember no-index is a FAR stronger command to the search engines then canonical tags, which they take as "suggestions".)
The only time it's appropriate to block no-indexed pages with robots is if you're absolutely certain the pages have never made it into the index in the first place. If they've never been indexed, you can no-index them for security, and then drop them behind the robots.txt to save crawl budget.
Hope that makes sense?
Paul
-
I don't know if you have those taco commercials where you live that have the little girl in them that says "Why not both!", but you might do that, it would not hurt and it would make you sleep better at night.
Oh, here is a link to the commercial, https://www.youtube.com/watch?v=vqgSO8_cRio
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
What's wrong with this robots.txt
Hi. really struggling with the robots.txt file
Technical SEO | | Leonie-Kramer
this is it: User-agent: *
Disallow: /product/ #old sitemap
Disallow: /media/name.xml When testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong. Can someone take a look at this and give me the solution.
Thanx in advance! Leonie1 -
Robots.txt & Mobile Site
Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger0 -
Is having no robots.txt file the same as having one and allowing all agents?
The site I am working on currently has no robots.txt file. However, I have just uploaded a sitemap and would like to point the robots.txt file to it. Once I upload the robots.txt file, if I allow access to all agents, is this the same as when the site had no robots.txt file at all; do I need to specify crawler access on can the robots.txt file just contain the link to the sitemap?
Technical SEO | | pugh0 -
Robots.txt query
Quick question, if this appears in a clients robots.txt file, what does it mean? Disallow: /*/_/ Does it mean no pages can be indexed? I have checked and there are no pages in the index but it's a new site too so not sure if this is the problem. Thanks Karen
Technical SEO | | Karen_Dauncey0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
SeoMoz robot is not able to crawl my website.
Hi, SeoMoz robot crawls only two web pages of my website. I contacts seomoz team and they told me that the problem is because of Javascript use. What is the solution to this? Should I contact my webdesign company and ask them to remove Javascript code?
Technical SEO | | ashish2110 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0