Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google for Jobs best practice for Job Boards?
I head up SEO for a niche job board. We disallowed our job ad pages (/job/) in the robots.txt as this is user-generated content and really eating up our crawl budget, causing penalties etc. Now Google for Jobs has hit the UK (our strongest region for traffic), I'm torn about what to do next. Our jobs will only show in GfJ if we remove the jobs pages from the robots.txt and apply the directed structured data to every single jobs page and monitor this constantly. I will also have to constantly invest in our website developers no indexing / canonicalizing new job pages and paginations. Is GfJ worth it? I have spoken to one other job board who has seen more brand awareness from appearing in GfJ but almost no traffic / application increase. But are we missing a trick here? Any advice would be greatly appreciated.
Intermediate & Advanced SEO | | gracekimberley11 -
Duplicate page title at bottom of page - ok, or bad?
Can I get you experts opinion? A few years ago, we customized our pages to repeat the page title at the bottom of the page. So the page title is in the breadcrumbs at the top, and then it's also at the bottom of the page under all the contents. Here is a sample page: bit.ly/1pYyrUl I attached a screen shot and highlighted the second occurence of the page title. Am worried that this might be keyword stuffing, or over optimizing? Thoughts or advice on this? Thank you so much! ron ZH8xQX6
Intermediate & Advanced SEO | | yatesandcojewelers0 -
Best practices with reoccurring event listings
On our client's events page there are a few reoccurring events that each have their own detail page. I'm trying to figure out what's the best practice for minimising duplicate content. For example, for the Bribie Island Markets that repeat weekly there are 2 (+more) detailed event pages: http://www.ourbribie.com/e/bribie-island-markets/1869/2013-12-07/2013-12-07
Intermediate & Advanced SEO | | michaelp85
http://www.ourbribie.com/e/bribie-island-markets/1869/2013-12-14/2013-12-14 While they both contain duplicated content, they're unique in that they display the specific events date/time. My thinking is that the future events (e.g. 2013-12-14) should have a canonical link to the upcoming/next event (i.e. 2013-12-07). However this would require constantly updating/changing the canonical links. What's the best way to deal with this from a duplicate content prospective? Any better recommendations?0 -
What's the best way to redirect categories & paginated pages on a blog?
I'm currently re-doing my blog and have a few categories that I'm getting rid of for housecleaning purposes and crawl efficiency. Each of these categories has many pages (some have hundreds). The new blog will also not have new relevant categories to redirect them to (1 or 2 may work). So what is the best place to properly redirect these pages to? And how do I handle the paginated URLs? The only logical place I can think of would be to redirect them to the homepage of the blog, but since there are so many pages, I don't know if that's the best idea. Does anybody have any thoughts?
Intermediate & Advanced SEO | | kking41200 -
Google is displaying my pages path instead of URLS (Pages name)
Does anyone knows why Google is displaying my pages path instead of the URL in the search results, i discoverd that while am searching using a keyword of mine then i copied the link http://www.smarttouch.me/services-saudi/web-services/web-design and found all related results are the same, could anyone one tell me why is that and is it really differs? or the URL display is more important than the Path display for SEO!
Intermediate & Advanced SEO | | ali8810 -
Best Strategy to display 8mg Images on Product Pages for Ecommerce
I have an ecommerce store that has a variety of images including some super high quality images that are 8 mg. This style of image could be completed for hundreds of products in the store. Does anyone have any tips on what I should be watching out for here? Is 8 mg too unusable?
Intermediate & Advanced SEO | | LukeyJamo0 -
What is the best: one big site or several small ones?
Sometimes I've felt that my choice was incorrect. I know taht with a big amount of content on the same subject a bigger one is better: it’s easier to maintain, and a big number of pages is good for Google ranking, but when you have many small sites, each one focusing on a totally different subject, you can crosslink them and this will improve your rankings. Furthermore, many small sites allow you to focus in specific niches and help you rank better for different keywords. So, what is the best choice?
Intermediate & Advanced SEO | | sergio_redondo0 -
Should I 301 Redirect Old Pages to Newer Ones?
I know there is value having lots of unique content on our websites, but I'm wondering how long it should be kept for, and if there is any value in 301 redirecting it? So, for example we have a number of pages on our website that are dedicated to single products (blue widget x, blue widget y, red widget x, red widget y). Nice unique content, with some (but not many) links. These products are no longer available though and have been replaced. So I'm faced with three choices: 1. Leave it as it is, and hope it adds to the overall site authority (by value of being another page), and also perhaps mop up a few longer tail keywords. Add a link to the replacement product on these pages; 2. 301 redirect these pages to the replacement products to give these a bit of a boost, and lose the content; 3. 301 redirect these pages to the replacement products and move all the old content to a new 'blue widgets archive' and 'red widgets archive' page? Would appreciate everyones thoughts!
Intermediate & Advanced SEO | | BigMiniMan0