Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to replace an already ranked page with a better, more optimised one?
Hello peeps! I need your collective wisdom to help me deal with something. We manage a website that is doing quite well in its niche, however we have the following problem: Our section landing pages are well established and they rank for a wide range of search terms, including some with a transactional focus. It is obvious that these pages do not cater for users with transactional intent. Our competitors are targeting those transactional keywords with a completely different type of pages, and are winning across the board (annoying but understandable). We have now created a number of pages, which are very similar to the ones that our competitors are using and with an even better on-page SEO score ... WIN! ...well, not so much! Our old section pages are still ranking for the transactional search terms and our new pages are getting very little traction and are having a really slow start. 1. I suspect there is some sort of page cannibalisation going on. How would you address that?
Intermediate & Advanced SEO | | Yordan.Vasilev
2. Is there a tried and tested way of telling search engines to rank your new page because it meets the search intent in a better way? Please note that we cannot just redirect the old page to the new one - there are structural and commercial reasons for keeping the old page as it is.
3. Is there anything else that I am missing? Your help is much appreciated.
Thanks
Yordan0 -
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
Home Page or Internal Page
I have a website that deals with personalized jewelry, and our main keyword is "Name Necklace".
Intermediate & Advanced SEO | | Tiedemann_Anselm
3 mounth ago i added new page: http://www.onecklace.com/name-necklaces/ And from then google index only this page for my main keyword, and not our home page.
Beacuase the page is new, and we didn't have a lot of link to it, our rank is not so well. I'm considering to remove this page (301 to home page), beacause i think that if google index our home page for this keyword it will be better. I'm not sure if this is a good idea, but i know that our home page have a lot of good links and maybe our rank will be higher. Another thing, because google index this internal page for this keyword, it looks like our home page have no main keyword at all. BTW, before i add this page, google index our main page with this keyword. Please advise... U5S8gyS.png j50XHl4.png0 -
Does Google make continued attempts to crawl an old page one it has followed a 301 to the new page?
I am curious about this for a couple of reasons. We have all dealt with a site who switched platforms and didn't plan properly and now have 1,000's of crawl errors. Many of the developers I have talked to have stated very clearly that the HTacccess file should not be used for 1,000's of singe redirects. I figured If I only needed them in their temporarily it wouldn't be an issue. I am curious if once Google follows a 301 from an old page to a new page, will they stop crawling the old page?
Intermediate & Advanced SEO | | RossFruin0 -
Best practice to avoid cannibalization of internal pages
Hi everyone, I need help from the best SEO guys regarding a common issue : the cannibalization of internal pages between each other. Here is the case : Let's say I run the website CasualGames.com. This website provides free games, as well as articles and general presentation about given categories of Casual Games. For instance, for the category "Sudoku Games", the structure will be : Home page of the game : http://www.casualgames.com/sudoku/ Free sudoku game listings : (around 100 games listed) http://www.casualgames.com/sudoku/free/ A particular sudoku game : http://www.casualgames.com/sudoku/free/game-1/ A news regarding sudoku games : http://www.casualgames.com/sudoku/news/title The problem is that these pages seem to "cannibalize" each other. Explanation : In the SERPS, for the keyword "Casual Games", the home page doesn't appear well ranked and some specific sudoku games page (one of the 100 games) are better ranked although they are "sub-pages" of the category.. Same for the news pages : a few are better ranked than the category page.. I am kind of lost.. Any idea what would be the best practice in this situation? THANKS a LOT.
Intermediate & Advanced SEO | | laboiteac
Guillaume0 -
How do I best optimise a page with 3 keywords that all contain 1 common word?
I am new to this so still getting to grips with a few things. I have a page here that I want to optimise for 3 keyword phrases. Towels, Egyptian Cotton Towels, Personalised Towels http://www.towelsrus.co.uk/towels/catlist_fnct561.htm SEOmoz reports a huge number of instances of the word towels overall. Title 3 URL 1 Meta Desc 5 H1 3 H2-4 1 Body 83 B / Strong 1 IMG ALT 11 Total Keyword Usage for this Page = 108How could i restructure meta tags and descriptions to still rank for these terms but reduce the level the word towels? Similarly as this is a category how can I reduce the term towels from other on page links?Also any extra advice regarding on page optimisation would be greatly appreciated to help our efforts
Intermediate & Advanced SEO | | Towelsrus0 -
For multi language sites, what is best - two domains or one with both languages?
We are assisting a client in setting up English and Spanish sites in Texas. They want to be able to find customers who are Spanish speaking predominantly or totally along with the customers they now get who are English speakers. We are building them a new site and I have researched to find answers all over the board or less than clear. Should the structure be such that we have one site with a set of English and Spanish pages all with Spanish links to Spanish pages and English links to English pages. Should we instead just have an English site for those people who utilize English and a different site for those who utilize Spanish? Thanks
Intermediate & Advanced SEO | | RobertFisher0 -
How does one know where to insert the right strips of coding on the right pages for Canonical Links?
On my Website, I am the only SEO optimizer wizard person. I have to teach myself everything and I get overwhelmed a lot. I recently started using SEOMOZ and on my report it stated we had duplicate page titles and that it was bad and should be fixed quickly. So I did my research and found that I needed to use canonical links to reference one page to be indexed. However my problem lies in exactly how to add this coding to my site. I greatly appreciate any help or at least looking at this question.
Intermediate & Advanced SEO | | FrontlineMobility0