Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags
-
Hi Moz Community,
We have the following robots command that should prevent URLs with tracking parameters being indexed.
Disallow: /*?
We have noticed google has started indexing pages that are using tracking parameters. Example below.
These pages are identified as duplicate content yet have the correct canonical tags:
With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags.
Can anyone shed any light onto the situation?
-
Google's multi-layered multi-algorithm system has come a long way in being able to "figure it all out", yet at the same time, falls far short of always successfully "getting it right".
Robots.txt files are no longer an absolute directive. They're now "just another signal", as are canonical tags, meta robots instructions, and their own Google Webmaster URL Parameters system.
Because of this its critical to be consistent across all signals. If you've got the robots.txt file set to not index pages, but also have inbound links from affiliates, that's a prime example of where inbound link signals can override the robots.txt file's instruction if they're not nofollowed links.
While they technically SHOULD not index them after discovering them off-site (because the destination says "index this other version"), that's part of their confused multilayered system.
I have a question though - from what limited information you've provided, this example is based on a url parameter of ?ec=
When I search Google using site:http://www.oakfurnitureland.co.uk/ inurl:ec
I see only three such pages indexed AND where those pages are "fully" indexed. All the rest (over 1,000 additional URLs), are in the Google system, however every one of those others has a meta description of "A description for this result is not available because of this site's robots.txt - learn more."
What that means is they are NOT fully indexing those pages - there is no worry to be had about duplicate content for those. Google is simply tracking that those URLs exist.
So - is that the only URL parameter you're worried about? If so, it's not a major problem on your site. Except for those few exceptions, Google is doing what you need them to do with those.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are backlinks within duplicate content ignored or devalued?
From what I understand, Googles no longer has a "Duplicate Content Penalty" instead duplicate content simply isn't show in the search results. Does that mean that any links in the duplicate content are completely ignored, or devalued as far as the backlink profile of the site they are linking to? An example would be an article that might be published on two or three major industry websites. Are only the links from the first website GoogleBot discovers the article on counted or are all the links counted and you just won't see the article itself come up in search results for the second and third website?
Intermediate & Advanced SEO | | Consult19010 -
Is it possible that Google would disregard canonical tag?
Hi all, I was wondering if it is possible for Google to diregard the canonical tag, if for example they decide it is wrongly put based on behavioural data. On the Natviscript Blog's individual blog posts there is a canonical tag for the www.nativescript.org/blog/details (printscreen - http://prntscr.com/e8kz5k). In my opinion it should not be there, and I've put request to our Engineering team for removal some time ago. Interestingly, all blog posts are indexed and got decent amount of organic traffic despite the tag. What do you think? Could it be that Google would disregard the tag based on usage data from let's say GA? Thanks, Lily
Intermediate & Advanced SEO | | lgrozeva0 -
Google Not Pulling The Right Title Tag & Meta Description
Hi guys. We've found Google is pulling the wrong information for our title tag and meta description. Instead of pulling the actual title tag, Google is pulling the menu name you click on to get to the page: "Bike Barcelona" instead of "Barcelona Bike Tours | ...." Also, we've found that, instead of pulling the meta description we wrote, Google is using text from the pages copy. Any tips?
Intermediate & Advanced SEO | | BarcelonaExperience0 -
How to make Google index your site? (Blocked with robots.txt for a long time)
The problem is the for the long time we had a website m.imones.lt but it was blocked with robots.txt.
Intermediate & Advanced SEO | | FCRMediaLietuva
But after a long time we want Google to index it. We unblocked it 1 week or 8 days ago. But Google still does not recognize it. I type site:m.imones.lt and it says it is still blocked with robots.txt What should be the process to make Google crawl this mobile version faster? Thanks!0 -
Thousands of Web Pages Disappered from Google Index
The site is - http://shop.riversideexports.com We checked webmaster tools, nothing strange. Then we manually resubmitted using webmaster tools about a month ago. Now only seeing about 15 pages indexed. The rest of the sites on our network are heavily indexed and ranking really well. BUT the sites that are using a sub domain are not. Could this be a sub domain issue? If so, how? If not, what is causing this? Please advise. UPDATE: What we can also share is that the site was cleared twice in it's lifetime - all pages deleted and re-generated. The first two times we had full indexing - now this site hovers at 15 results in the index. We have many other sites in the network that have very similar attributes (such as redundant or empty meta) and none have behaved this way. The broader question is how to do we get the indexing back ?
Intermediate & Advanced SEO | | suredone0 -
Why will google not index my pages?
About 6 weeks ago we moved a subcategory out to becomne a main category using all the same content. We also removed 100's of old products and replaced these with new variation listings to remove duplicate content issues. The problem is google will not index 12 critcal pages and our ranking have slumped for the keywords in the categories. What can i do to entice google to index these pages?
Intermediate & Advanced SEO | | Towelsrus0 -
Canonical URL Question
Hi Everyone I like to run this question by the community and get a second opinion on best practices for an issue that I ran into. I got two pages, Page A is the original page and Page B is the page with duplicate content. We already added** ="Page A**" />** to the duplicate content (Page B).** **Here is my question, since Page B is duplicate content and there is a link rel="canonical" added to it, would you put in the time to add meta tags and optimize the title of the page? Thanks in advance for all your help.**
Intermediate & Advanced SEO | | DRTBA0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1