Robots.txt vs. meta noindex, follow
-
Hi guys,
I wander what your opinion is concerning exclution via the robots.txt file.
Do you advise to keep using this? For example:User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/*Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is.Regards,
Tom Vledder -
Hi Tom
Agree with Martijn that it depends for example, the robots.txt is generally the first port of call for bots as it allows them to understand where you want them to spend their finite time crawling your site. You can aslo give direction to all bots at once or specify a subset. It is generally the best option for blocking pages such as you /cart/ etc were they don't need crawling.
The problem with robots.txt is that it doesn't always keep pages from being indexed especially if there are other external sources linking to the pages in question.
The meta tag noindex on the other hand can be applied to individual pages and you are actually commanding the robots to NOT Index the relevant page in serps, use this option if you have pages you don't want appearing in Google (or other search engines) but the page may still be relevant for authority or able to acquire links (make sure to use Noindex follow) as you still want the robots to crawl the page. Otherwise use Noindex Nofollow hope that this helps.
-
Hi Tom,
It depends, for the /sale/ I would make an exception to make sure that it could be sales pages. But for the other pages I wouldn't want a search engine to waste any crawl budget by looking at these pages for a start. That's why I would go there with a robots.txt implementation instead of META robots as then they'll still visit the page to figure out there they won't index the page.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Is there a benefit to Microdata vs. RDFa Lite?
Is there any community consensus about whether Microdata or RDFa Lite is the superior rich-snippet format? I work as a design/front-end-developer and in terms of pure coding, RDFa Lite seems the superior method. It looks to be more flexible and more extensible. The W3C spec is also more mature—it's a W3C Recommendation where Microdata is only a W3C Working Draft—so it's more likely to reach full standardization sooner. Also, because it's a Recommendation it's less likely to change. However, I hear Google "strongly recommends" the use of Microdata. Do they not support RDFa/RDFa Lite? There doesn't seem to be a great deal of discussion on this anywhere so I'm tempted to think it's sort of irrelevant. I am aware that Schema.org is, supposedly, now supporting RDFa Lite.
Technical SEO | | kongregate0 -
How many times robots.txt gets visited by crawlers, especially Google?
Hi, Do you know if there's any way to track how often robots.txt file has been crawled? I know we can check when is the latest downloaded from webmaster tool, but I actually want to know if they download every time crawlers visit any page on the site (e.g. hundreds of thousands of times every day), or less. thanks...
Technical SEO | | linklater0 -
Robots.txt Showing in SERP Results
Currently doing a technical audit for a website and when I search "Site:website.com -www" the only result is website.com/robots.txt I was wondering if anyone else has come across this before -- or what this may mean from a technical audit standpoint. Thank you!
Technical SEO | | vectormedia0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
Root vs. Index.html
Should I redirect index.html to "/" or vice versa? Which is better for duplicate content issues?
Technical SEO | | DavetheExterminator0 -
Should i use NoIndex, Follow & Rel=Canonical Tag In One Page?
I am having pagination problem with one of my clients site , So I am deciding to use noindex, follow tag for the Page 2,3,4 etc for not to have duplicated content issue, Because obviously SEOMoz Crawl Diagnostics showing me lot of duplicate page contents. And past 2 days i was in constant battle whether to use noindex, follow tag or rel=canonical tag for the Page 2,3,4 and after going through all the Q&A,None of them gives me crystal clear answer. So i thought "Why can't i use 2 of them together in one page"? Because I think (correct me if i am wrong) 1.noindex, follow is old and traditional way to battle with dup contents
Technical SEO | | DigitalJungle
2.rel=canonical is new way to battle with dup contents Reason to use 2 of them together is: Bot finds to the non-canonical page first and looks at the tag nofollow,index and he knows not to index that page,meantime he finds out that canonical url is something something according to the url given in the tag,NO? Help Please???0