No indexing url including query string with Robots txt
-
Dear all,
how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt?
Thanks!
-
Dear all, what is the best option? And are the option below good? A: Disallow
- sort-order (Only URLs with value = asc)
"A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings"
source:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
B: User-agent:
Googlebot Disallow: /*.=name$
for example www.sub.domain.com/collection.html?dir=desc&order=name source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Thanks!
-
You could always just use rel="canonical" which would be much better than completely blocking all URL parameters.
-
Hey,
Should that second URL be www.sub.domain.com/collection/adresboeken.html?whatever=something If so, then by using /collection/?* you are saying that anything within /collection/ with a query string should not be indexed. If adresboeken.html always has a query string, it may not get indexed.
The other options I'd consider before using robots.txt are telling Google to ignore dir=desc&order=color in Google Webmaster Tools parameter handling. This is the best way to handle query string issues. (Assuming you are trying to influence Google. Clearly Google Webmaster Tools won't affect Bing!)
Another idea is to set a canonical URL on /collection/adresboeken.html referencing /collection/adresboeken.html without the query string. This tells the search engines that the query strings do not make a unique URL. (adresboeken.html?dir=desc&order=color is the same as adresboeken.html?dir=desc&order=price is the same as adresboeken.html?dir=asc&order=color is the same as adresboeken.html, and so on).
I hope that helps. Thanks,
Matthew -
Hi,
Robots.txt works mainly on 2 rules. Those are User-agent: and Disallow:
User-agent: the name of the robot you need to block
Disallow: the url or folder or other url with conditions you need to block.
As you have asked in your question you need to block a url with a condition. But you have to remember that Robot.txt is giving so critical results if you did not use it correctly.
Anyway in your question, you wanted to block url/pages with query strings like page.html?dir=asc&order=name
so you have to use following:
User-agent: *
Disallow: /*?
So the above will block all the urls with a question mark (?) for all the search robots. This will not block only page.html?dir=asc&order=name it will alos block comments.html?dir=asc&order=name
So use it so carefully.
Hope this is the what you have looked for. If need more help you may ask.
Regards
Prasad
-
Dear all,
thanks for responding. If I have a pages like
1. www.sub.domain.com/collection.html exists, I want to index it, and
2. www.sub.domain.com/collection.html?dir=desc&order=color which I don't want to index
Is this the way to do this in de robots.txt?:
Disallow: /collection/?*
Thanks!
-
Hi,
Here is an article explaining how to do this in robots.txt:
http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/Depending on what you are trying to do, it might also be worth investigating parameter handling in Google Webmaster Tools:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687Thanks,
Matthew
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Strange Behavior - Dupe Content Via Query String URLs?
Hey y'all, could use community help with some strange behavior I'm seeing with a particular ranking. A week ago a high volume keyword ranking above the fold dropped off the map. I immediately thought must be an algorithmic penguin penalty (no manual action message) or panda / dupe content issue. I think it's dupe content at this point because I found my former ranking page in the omitted results section for the keyword we used to rank for. The strange thing is that without making any changes, Google would momentarily show our domain ranking high page one again, but with a strange query string URL. At first just domain.com/page/? whereas the old ranking was held by domain.com/page/ but now I see several long query string URLs floating around that the engines don't seem to know what to do with. Canonical tags are in place to canonicalize any query string URL back to the top and I have now designated query string URLs as unimportant in Search Console parameter filtering but these URLs persist. I ended up deduplicating content to a page on another domain we own (think that was the original problem) and there seemed to be a positive effect but now we are top of page 2 with a much longer query string URL as the ranking page. It seems Google wants to rank everything but the former ranking URL even though it's the most authoritative by far, has canonical signals in place, and is now no longer duplicate content. Content checker tool showed 60% similarity to the other piece, which is a ratio I've never known to cause dupe content. We found the source of the query string URLs to be from an external site that has a link to us but it's a buggy site so filtering on the page adds the string to our URL, so Google can find them and thinks they're significant. Long question short, has anyone had trouble like this? Getting weird parameter / query URLs to get out of the index in favor of the non-parameter folder? Is it possible the main folder page got hit with Penguin and is "banned?" Still, I don't know why Google would go out of it's way to rank query string copy pages in its place if that were the case. Any help greatly appreciated. An example of the URL looks like this:
Technical SEO | | Alder
domain.com/page/?CustomerSubscriptionTrack1PageSize=1&CustomerSubscriptionTrack1Order=Sorter_ID&CustomerSubscriptionTrack1Dir=ASC&CustomerSubscriptionTrack1Page=3&WorkOrder_TBLOrder=Sorter_AssetID&WorkOrder_TBLDir=ASC&ID=1060 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Blocked URL parameters can still be crawled and indexed by google?
Hy guys, I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand: IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url? IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand? Thanks, PS: ok 3 questions :)...
Technical SEO | | catalinmoraru0 -
Robots.txt anomaly
Hi, I'm monitoring a site thats had a new design relaunch and new robots.txt added. Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14). In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example: Disallow: /wp-content/plugins
Technical SEO | | Dan-Lawrence
Disallow: /wp-content/cache
Disallow: /wp-content/themes etc, etc And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ? Thanks in advance for any help/advice/clarity why this may be happening ? Cheers Dan0 -
Shorter URLs
Hi Is there a real value in having the keywords in the URL structure? we could use the URL: Mybrand.com/software/tablets/ipad/supertrader.html Or instead have the CMS create the shorter version mybrand.com/supertrader.html and just optimize this page for the keyword 'supertrader ipad software'
Technical SEO | | FXDD1 -
Are URL's with trailing slash seen as two different URLs
Hello, http://www.example.com and http://ww.example.com/ Are these seen as two different URL's ? Just as with www or non www ? Or it doesn't make any difference ?
Technical SEO | | seoug_20050 -
Robots.txt
Hi everyone, I just want to check something. If you have this entered into your robots.txt file: User-agent: *
Technical SEO | | PeterM22
Disallow: /fred/ This wouldn't block /fred-review/ from being crawled would it? Thanks0