No indexing url including query string with Robots txt
-
Dear all,
how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt?
Thanks!
-
Dear all, what is the best option? And are the option below good? A: Disallow
- sort-order (Only URLs with value = asc)
"A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings"
source:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
B: User-agent:
Googlebot Disallow: /*.=name$
for example www.sub.domain.com/collection.html?dir=desc&order=name source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Thanks!
-
You could always just use rel="canonical" which would be much better than completely blocking all URL parameters.
-
Hey,
Should that second URL be www.sub.domain.com/collection/adresboeken.html?whatever=something If so, then by using /collection/?* you are saying that anything within /collection/ with a query string should not be indexed. If adresboeken.html always has a query string, it may not get indexed.
The other options I'd consider before using robots.txt are telling Google to ignore dir=desc&order=color in Google Webmaster Tools parameter handling. This is the best way to handle query string issues. (Assuming you are trying to influence Google. Clearly Google Webmaster Tools won't affect Bing!)
Another idea is to set a canonical URL on /collection/adresboeken.html referencing /collection/adresboeken.html without the query string. This tells the search engines that the query strings do not make a unique URL. (adresboeken.html?dir=desc&order=color is the same as adresboeken.html?dir=desc&order=price is the same as adresboeken.html?dir=asc&order=color is the same as adresboeken.html, and so on).
I hope that helps. Thanks,
Matthew -
Hi,
Robots.txt works mainly on 2 rules. Those are User-agent: and Disallow:
User-agent: the name of the robot you need to block
Disallow: the url or folder or other url with conditions you need to block.
As you have asked in your question you need to block a url with a condition. But you have to remember that Robot.txt is giving so critical results if you did not use it correctly.
Anyway in your question, you wanted to block url/pages with query strings like page.html?dir=asc&order=name
so you have to use following:
User-agent: *
Disallow: /*?
So the above will block all the urls with a question mark (?) for all the search robots. This will not block only page.html?dir=asc&order=name it will alos block comments.html?dir=asc&order=name
So use it so carefully.
Hope this is the what you have looked for. If need more help you may ask.
Regards
Prasad
-
Dear all,
thanks for responding. If I have a pages like
1. www.sub.domain.com/collection.html exists, I want to index it, and
2. www.sub.domain.com/collection.html?dir=desc&order=color which I don't want to index
Is this the way to do this in de robots.txt?:
Disallow: /collection/?*
Thanks!
-
Hi,
Here is an article explaining how to do this in robots.txt:
http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/Depending on what you are trying to do, it might also be worth investigating parameter handling in Google Webmaster Tools:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687Thanks,
Matthew
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
To include / at the end of a URL or not
Hi I have recently noticed my site works with / and the end of a URL and without. I wanted to know if there is any SEO impact on this? Will it be seen as 2 different pages? if so what is the best option to go for www.mydomain.com/page/ or www.mydomain.com/page Thanks E
Technical SEO | | Direct_Ram0 -
What is the best practice to re-index the de-indexed pages due to a bad migration
Dear Mozers, We have a Drupal site with more than 200K indexed URLs. Before 6 months a bad website migration happened without proper SEO guidelines. All the high authority URLs got rewritten by the client. Most of them are kept 404 and 302, for last 6 months. Due to this site traffic dropped more than 80%. I found today that around 40K old URLs with good PR and authority are de-indexed from Google (Most of them are 404 and 302). I need to pass all the value from old URLs to new URLs. Example URL Structure
Technical SEO | | riyas_
Before Migration (Old)
http://www.domain.com/2536987
(Page Authority: 65, HTTP Status:404, De-indexed from Google) After Migration (Current)
http://www.domain.com/new-indexed-and-live-url-version Does creating mass 301 redirects helps here without re-indexing the old URLS? Please share your thoughts. Riyas0 -
Writing of url query strings to be seo frinedly
I understand the basic concepts of url write and creating inbound and outbound rules. I understand the creating of rules to rewrite url query strings so that it’s readable and seo friendly. It’s simple when dealing with a small number of pages and database records. (Microsoft Server, asp.net 4.0, IIS 7) However, I need to understand the concept to handle this: Viz the following: We have a database of 10,000+ establishments, 650+ cities, 400+ suburbs. Each establishment can be searched for by country, province, city and suburb. The search results show establishments that match the search criteria. Each establishment has its own unique id. Each establishment in the search results table has a link to the establishments detailed profile aspx page. The link is a query string such as http://www.ubuntustay.com/detailed.aspx?id=4 which opens the establishments profile. We need to rewrite the url to be something like: http://www.ubuntustay.com/detailed.aspx/capetown/westerncape/capetown/campsbay/diamondhouse which should still open the same establishment profile as the above query string. I can manually create a rule for this one example above without a problem. But there are over 10,000 establishments, all in different provinces, cities and suburbs. Surely we don’t manually generate a rewrite rule for each establishment? The resulting .htaccess will be rather large(?!) Therefore my questions are: How do I create url rewrite rules for dynamic query strings that originate from a large dataset? How do I translate the id number into the equivalent <country>/<province>/<city>/<suburb>/ <establishment>syntax?</establishment></suburb></city></province></country> Do I have to wire-up the global.asax so that every incoming requests extracts the country, province, city and suburb based on the establishment id which seem a bit cumbersome(?). If you’re wondering how I currently do it (it works but it’s not very portable or efficient): For each establishment which is included on the search results I simply construct the link url as: http://www.ubuntustay.com/detailed.aspx/4/Diamond%20House/Camps%20Bay/Cape%20Town On the detailed.aspx page load I simply extract the record id (4 in the example above) from the querystring and select that record from the db. Claude, what I’m looking for is advice on the best approach on how to create these rewrite rules and would be grateful if you can have one of your SEO friends lend their advice and experience. Any web resources that show the above techniques would be great. I’m not really looking for simple web links to url rewriting overviews…I have plenty of those. It’s the detail on the specific requirement above that I need please.
Technical SEO | | claudeSteyn0 -
Changing all urls
A client of mine has a wordpress website that is installed in a directory, called "site". So when you go to www.domain.com you are redirected to www.domain.com/site. We all know how bad it is to have a redirect fron your subdomain to another page. In this case I measured a loss of 5 points of page authority. The question is: what is the best practice to remove the "site" from the address and changing all the urls? Should I use the webmaster tool to tell to Google that the site is moving? It's not 100% true, cause the site is just moving one level up. Should I install a copy of the website under www.domain.com and just redirect 301 every old page to its new url? This way I think the site would be deindexet for 2/3 months. Any suggestions or tips welcome! Thanks DoMiSol
Technical SEO | | DoMiSoL0 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
Robots.txt and 301
Hi Mozzers, Can you answer something for me please. I have a client and they have 301 re-directed the homepage '/' to '/home.aspx'. Therefore all or most of the linkjuice is being passed which is great. They have also marked the '/' as nofollow / noindex in the Robots.txt file so its not being crawled. My question is if the '/' is being denied access to the robots is it still passing on the authority for the links that go into this page? It is a 301 and not 302 so it would work under normal circumstances but as the page is not being crawled do I need to change the Robots.txt to crawl the '/'? Thanks Bush
Technical SEO | | Bush_JSM0 -
Including spatial location in URL structure. Does subfolder number and keyword order actually matter?
The SEOMoz On-Page report for my site brings up one warning (among others) that I find interesting: Minimal Subfolders in the URL My site deals with trails and courses for both races and general running. The structure for a trail is, for example: /trails/Canada/British-Columbia/Greater-Vancouver-Regional-District/Baden--Powell-Trail/trail/2 The structure for courses is: /course/28 In both cases, the id at the end is used for a database lookup. I'm considering an URL structure that would be: /trail/Baden-Powell-Trail/ca-bc-vancouver This would use the country code (CA) and sub-country code (BC) along with the short name for the region. This could be good because: it puts the main keyword first the URL is much shorter there are only 3 levels in the URL structure However, there is evidence, from Google's Matt Cutts, that the keyword order and URL structure don't matter in that way: See this post: http://www.seomoz.org/q/all-page-files-in-root-or-to-use-directories If Matt Cutts says they aren't so important then why are they listed in the SEOMoz On-Page Report? I'd prefer to use /trail/ca-bc-vancouver/Baden-Powell-Trail. I'll probably do a similar thing for courses. Is this a good idea? Thoughts? Many thanks, in advance, for your help. Cheers, Edward watch?v=l_A1iRY6XTM watch?v=gRzMhlFZz9I
Technical SEO | | esarge0