URL Parameters as a single solution vs Canonical tags
-
Hi all,
We are running a classifieds platform in Spain (mercadonline.es) that has a lot of duplicate content. The majority of our duplicate content consists of URL's that contain site parameters. In other words, they are the result of multiple pages within the same subcategory, that are sorted by different field names like price and type of ad. I believe if I assign the correct group of url's to each parameter in Google webmastertools then a lot these duplicate issues will be resolved.
Still a few questions remain:
- Once I set f.ex. the 'page' parameter and i choose 'paginates' as a behaviour, will I let Googlebot decide whether to index these pages or do i set them to 'no'? Since I told Google Webmaster what type of URL's contain this parameter, it will know that these are relevant pages, yet not always completely different in content. Other url's that contain 'sortby' don't differ in content at all so i set these to 'sorting' as behaviour and set them to 'no' for google crawling.
- What parameter can I use to assign this to 'search' I.e. the parameter that causes the URL's to contain an internal search string. Since this search parameter changes all the time depending on the user input, how can I choose the best one. I think I need 'specifies'?
- Do I still need to assign canonical tags for all of these url's after this process or is setting parameters in my case an alternative solution to this problem?
I can send examples of the duplicates. But most of them contain 'page', 'descending' 'sort by' etc values.
Thank you for your help.
Ivor
-
Great! All clear to me now.
I'll let you know how things will have developed soon.
Thanks for your input!
Best,
Ivor
-
Hi Ivor,
I wouldn't pay much attention to those Google guidelines about duplicate content.
Yes, Canonical tags are best practice, but what you're dealing with is dynamically generated query URLs from your CMS. If you opted to follow Google's guidelines on this you'd have to either manually set Canonical tags for each query as it is created, or set up a rule to do this automatically.
Both sound tricky to me so I'd just stick with the robots.txt alterations you've made and you should be fine.
Make sure you set back everything to index, follow. This is because you're giving the search engine instructions to ignore specific URLs in the robots.txt and you're also doing this in the meta robots function.
When this occurs the search engine gets confused and then makes it's own best judgement as per the article you've referenced.
Best to keep it simple and leave everything index, follow and keep the robots.txt in place to block these URLs and see how your results go.
Also might be a good idea to touch up your content on the page. I'd suggest about 250 words of content with your targeted keyword twice and 2-3 LSI keywords once each. You can put this at the bottom of the page, after the products so it doesnt push your products down. For more info on content you can check out my blog post here: http://searchfactory.com.au/blog/optimise-content-marketing-writing-for-google-hummingbird-semantic-search/
All the best!
Stel (@StelinSEO )
-
Hi Stel,
It all seems to work fine. After i waited until this morning for the weekly MOZ crawl, I notice the technical issues dropped almost completely. But I keep being confused whether i should allow for these pages still to be set to either "index, follow" or rather to "no-index, no follow"?
Right now, we have set dissallow commands in robots.txt, canonical tags and no index, no follow tags.
If you read Google's guidelines, they don't recommend blocking duplicate content in robots.txt but seem to prefer using canonical tags only https://support.google.com/webmasters/answer/66359
Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the
rel="canonical"
link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.And with duplicate content not set to no-index, no-follow they claim they would choose for the right pages to be displayed:
Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a "regular" and "printer" version of each article, and neither of these is blocked with a noindex meta tag, we'll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
So if I read this, I should perhaps set my tags to index, follow? And still keep the robots.txt commands and canonical rel tags?
Thanks a lot for your input.
Ivor
-
Hi Ivor,
The problem with _Disallow: /*? _is it only blocks top level queries like this: **mercadonline.es/?page=13&sort=price_true **, but it won't block this: mercadonline.es/anuncios-ciudad-real/?page=13&sort=price_true
So by adding a wildcard directory (i.e. Disallow: //?) this will block queries that occur at any level of your URL structure, like the one second bold example above.
You can indeed just block all queries if you like, but I'm not 100% what your structure is like. If you're sure it won't adversely affect any other pages, then Disallow: //? will solve the sort, price and page issues you've highlighted.
Once you're happy with the robots.txt (just had a look and looks fine to me) run it through screamingfrog and siteliner.com and see if these domains have been blocked and what Duplicate content issues exist.
-
Thank your Donford!
- Ivor
-
Hi Stel,
Thanks for your answer.
- Since we have already added: Disallow: /*? to the robots.txt, will this already exclude all parameters? Or is it better to refine this as you describe as follows:
Disallow: /*/*sort
Disallow: /*/*descending
Disallow: /*/*orderby
- Moreover, would I have to add as well:
Disallow: /*/*page
Disallow: /*page
- Finally, is we have search strings in our parameters; could we add this as well to our robots.txt? Since this content changes all the time.
If you like, I can send you my robots.txt file in a PM.
Thanks a lot for your help!
Ivor
-
Hi Ivor,
I concur with donford's answer, definitely something that can be sorted out by the robots text file. However, I would suggest using the following parameters for robots.txt:
**User-agent: ***
*Disallow: /*/page
*Disallow: /*/sort
*Disallow: /*/descendingMy reason for suggesting the extra /* is this will target URLs that appear on the second or below level.
I may be wrong, but it's best to try both by using the robots.txt checker in Webmaster Tools.
This article will give you an overview of how the robots.txt checker works: https://support.google.com/webmasters/answer/6062598?hl=en
All you have to do is click the link on the post that says robots.txt checker, login to Webmaster Tools and paste everything you see in bold in the text box. Then paste the following (also in bold) into the field below that says Enter a URL to test if it is blocked anuncios-ciudad-real/?page=13&sort=price_true
Click the test button and if it says BLOCKED you can add this to your robots.txt file, stored at top level in your FTP server.
Feel free to Tweet me at @StelinSEO if you have any further issues!
All the best,
Stel
-
Hi Ivor,
This is a very good place for canonical tags. If you put the canonical tag on the root page then you should be okay when the page=2 or sort=Az parameters are added it will still canonical to root page. There is nothing wrong with putting a canonical page tag to itself so there is little worry about.
Fixing parameters in Google is only one of the search engines all the other crawlers won't know what Google sees so it is best to fix it for everybody.
The other option would be to use a exclude in your robots.txt so the pages are not seen as duplicates, but I would advise to use canonical first.
User-agent: *
Disallow: /*page
User-agent: *
Disallow: /*sort
For example.
Hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question on Indexing, Hreflang tag, Canonical
Dear All, Have a question. We've a client (pharma), who has a prescription medicine approved only in the US, and has only one global site at .com which is accessed by all their target audience all over the world.
Intermediate & Advanced SEO | | jrohwer
For the rest of the US, we can create a replica of the home page (which actually features that drug), minus the existence of the medicine, and set IP filter so that non-US traffic see the duplicate of the home page. Question is, how best to tackle this semi-duplicate page. Possibly no-index won't do because that will block the site from the non-US geography. Hreflang won't work here possibly, because we are not dealing different languages, we are dealing same language (En) but different Geographies. Canonical might be the best way to go? Wanted to have an insight from the experts. Thanks,
Suparno (for Jeff)1 -
Canonical Confusion
So I have products appearing in several categories, all of which have the correct canonical url. But Moz is flagging up pages I never knew existed, and I don't understand why they exist at all and more so why my canonical fix isn't occurring for them, as below: SEO Friendly URL: http://thespacecollective.com/nasa-pin-sets/nasa-shuttle-mission-pin-set-no2 Weird URL to same product: http://thespacecollective.com/index.php?route=themecontrol/product&product_id=159 Is this a developer problem rather than an SEO problem?
Intermediate & Advanced SEO | | moon-boots0 -
Use Nonindex or Canonical on product tags of a e-commerce site
I run a e-commerce site and we have many product tags. These product tags come up as "Duplicate Page Content" when Moz does it's crawl. I was wondering if I should use Nonindex or Canonical? The tags all go to the same product when used so I figure I would just nonindex them but was wondering what's the best for SEO?
Intermediate & Advanced SEO | | EmmettButler1 -
Meta Tags (again)
Hey, I know this has been discussed to death but look back through previous postings there doesn't seem to be a consensus on the exact Meta tags that an eCommerce site should include, specifically whether to remove the keyword tag or not since it is believed that Yahoo potentially still makes use of it. Currently our homepage has the following Meta Tags: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Keywords" content="<a class="attribute-value">ink cartridges, cheap cartridges, inkjet cartridges, inkjet ink cartridges, ink cartridge, printer ink cartridges, laser cartridges, toner, laser printers</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/> author" content="<a class="attribute-value">Ink Cartridges, Inkjet Cartridge, Printer Cartridge, Toner Cartridges Refresh Cartridges</a>" /> expires" content="<a class="attribute-value">0</a>" /> robots" content="<a class="attribute-value">noodp,index,follow</a>" /> Language" content="<a class="attribute-value">English</a>" /> Cache-Control" content="<a class="attribute-value">Public</a>" /> verify-v1" content="<a class="attribute-value">sJXqAAWP6ar/LTEOMyUgG6nqothxk62tJTid+ryBJxo=</a>" /> viewport" content="<a class="attribute-value">width=1024</a>" /> This is too messy but before I do something drastic that I'll possibly regret please can you confirm that, in your opinion, I am best to remove everything with the exception of this: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/>
Intermediate & Advanced SEO | | ChrisHolgate
viewport" content="<a class="attribute-value">width=1024</a>" /> I realise there is a verify-v1 tag in there but this can be done through a file on our server so while cleaning up that might as well go. Would there be an argument for keeping any of the other tags or are they all pretty much redundant now? Many thanks! Chris0 -
Canonical tag - but Title and Description are slightly different
I am building a new SEO site with a "Silo" / Themed architecture. I have a travel website selling hotel reservations. I list a hotel page under a city page - example, www.abc.com/Dallas/Hilton.html Then I use that same property under a segment within the city - example www.abc.com/Dallas/Downtown/Hilton.html, so there are two URLs with the same content Both pages are identical, except I want to customize the Title and Description. I want to customize the title and description to build a consistent theme - for example the /Downtown/Hilton page will have the words "Near Downtown" in the Title and Description, while the primary city Hilton page will not. So I have two questions about this. First, is it okay to use a canonical tag if the Title and Description are slightly different? Everything else is identical. If so, will Google crawl and comprehend the unique Title and Description on the "Downtown" silo? I want Google to see that I have several "supporting" pages to my main landing page(s). I want to present to Google 5 supporting pages in each silo that each has a supporting keyword theme. But I'm not sure if Google will consider content of pages that point to a different page using the canonical tag. Please see this supporting example: http://d.pr/i/aQPv Thanks for your insights. Rob
Intermediate & Advanced SEO | | partnerf0 -
Canonical link vs root domain
I have a wordpress website installed on http://domain.com/home/ instead of http://domain.com - Does it matter whether I leave it that way with a canonical link from the domain.com to the domain.com/home/ or should I move the wordpress files and database to the root domain?
Intermediate & Advanced SEO | | JosephFrost0 -
Canonical Tags?
I read that Google will "honor" these tags if your website has two url's with duplicate content. The duplicate content does not show up in my SEOmoz crawls report but they do in the search engines and many of "non authoritative links" that are generated from my search feature j(ugly url's with % ...not real user friendly) are ranking higher than the "good URL" links. So if I do the canonical tags I guess my higher ranking bad urls will drop. I even read that google might even completely overlook the links. I read somewhere that the best way to do this is with a 301 redirect...is that correct? I m ranking pretty good with my main keyword terms so I am afraid to make changes not knowing the effect. Any suggestions? Thanks, Boo
Intermediate & Advanced SEO | | Boodreaux0 -
Questions regarding Google's "improved url handling parameters"
Google recently posted about improving url handling parameters http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html I have a couple questions: Is it better to canonicalize urls or use parameter handling? Will Google inform us if it finds a parameter issue? Or, should we have a prepare a list of parameters that should be addressed?
Intermediate & Advanced SEO | | nicole.healthline0