Duplicate Content & Canonicals
-
I am a bit confused about canonicals and whether they are "working" properly on my site. In Webmaster Tools, I'm showing about 13,000 pages flagged for duplicate content, but nearly all of them are showing two pages, one URL as the root and a second with parameters. Case in point, these two are showing as duplicate content:
http://www.gallerydirect.com/art/product/vincent-van-gogh/starry-night
We have a canonical tag on each of the pages pointing to the one without the parameters. Pages with other parameters don't show as duplicates, just one root and one dupe per listing,
So, am I not using the canonical tag properly? It is clearly listed as:Is the tag perhaps not formatted properly (I saw someone somewhere state that there needs to be a /> after the URL, but that seems rather picky for Google)?Suggestions?
-
Thanks, Dr. Pete.
I'll discuss the options with our dev team and see which one will cause the least amount of developer caffeine consumption.
-
Argh... sorry, I didn't even check/see that. Yeah, that may be a real problem - you're basically sending two canonicalization signals that are in conflict. Is there any way to hide the defaults? If the canonicals point to (A), but then (A) redirects to (B), Google may just ignore the canonical.
Unfortunately, your options are to either: (1) hope for the best, (2) canonical to the uglier URL, or (3) kill the redirect and set the default parameters on the server-side (without resetting the URL).
I am primarily seeing the canonical URL in Google's index, so I'm not sure it's actually causing you harm. It's just not an ideal situation.
-
Dr. Pete:
I'm looking into it to be sure, but I believe that you are correct in that this is an ad-tracking URL.
A follow up question:
The URL that is the canonical version of each page would be in the format of
http://www.gallerydirect.com/art/product/vincent-van-gogh/starry-night
However, this exact URL redirects to one with default parameters for substrate, style and frame size:
Should we change our canonical from the first URL (without the parameters) to the second URL with the parameters? Or is that a moot point with Google?
-
While the properly closed tag should have "... />", that's generally only an issue in very isolated cases. I've never seen it interfere with a canonical tag. It's a harmless change to make (and it is more correct), but my gut reaction is that this will make no difference. Google should be honoring these canonicals.
One odd thing I'm seeing. If I dig into the index, I'm finding the following page:
This may be an ad-tracking URL (?) and it's redirecting somehow (but not with a 301 or 302) to the non-canonical URL. This may be sending a mixed signal, and ideally it would redirect to the canonical version of the URL. I'm not sure where this version is coming from, so it's a bit hard to diagnose.
-
Hi Darin
The tag is not working because if you go into Google and enter the URL: http://www.gallerydirect.com/art/product/vincent-van-gogh/starry-night?substrate_id=3&product_style_id=8&frame_id=63&size=25x20 you will see that it is being indexed on Google.
If it's being indexed, then it runs the risk of duplicate content issues.
The tag definitely does need the /> at the end, so the correct usage of the tag would be: rel="canonical" href="http://www.gallerydirect.com/art/product/vincent-van-gogh/starry-night" />
I think if you implement that small change, there shouldn't be any problems.
Hope this helps.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How Does Google View Hidden Content?
I have a website which contains a lot of content behind a show hide, does Google crawl the "hidden" copy?
Web Design | | jasongmcmahon0 -
We added hundreds of pages to our website & restructured the layout to include 3 additional locations within the sub-pages, same brand/domain name. How long could Google take to crawl/index the new pages and rank the keywords used within those pages?
We added hundreds of pages to our website & restructured the layout to include 3 additional locations within the sub-pages, same brand/domain name. The 3 locations old domains were redirected to their sites within our main brand domain. How long could Google take to crawl/index the new pages and rank the keywords used within those pages? And possibly increase our domain authority hopefully? We didn't want our brand spread out over multiple websites/domains on the internet. This also allowed for more content to be written on pages, per each of our locations service's, as well.
Web Design | | BurgSimpson0 -
Duplicate content on websites for multiple countries
I have a client who has a website for their U.S. based customers. They are currently adding a Canadian dealer and would like a second website with much of the same info as their current website, but with Canadian contact info etc. What is the best way to do this without creating duplicate content that will get us penalized? If we create a website at ABCcompany.com and ABCCompany.ca or something like that, will that get us around the duplicate content penalty?
Web Design | | InvoqMarketing0 -
Avoiding duplicate content with multi-lagusage site
Hi, We have a client in China that is looking to create three versions of the same website, English, Chinese and Korean. They do not want to use a translation plugin like Google translate, preferring to have the pages duplicated. What is the best way to do this bearing in mind that the site needs to be found in all three languages. Would also appreciate if anyone knows of a good hosting company that has English support on the Chinese main land. Thanks Fraser
Web Design | | fraserhannah0 -
Google Bot cannot see the content of my pages
When I go to Google Webmaster tools and I type in any URL from the site http://www.ccisolutions.com in the "Fetch as Google Bot" feature, and then I click the link that says "success," Google bot is seeing my pages like this: <code>HTTP/1.1 200 OK Date: Tue, 26 Apr 2011 19:11:50 GMT Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7a DAV/2 PHP/5.2.4 mod_jk/1.2.25 Set-Cookie: CCISolutions-UT-Status=66.249.72.55.1303845110495128; path=/; expires=Thu, 25-Apr-13 19:11:50 GMT; domain=.ccisolutions.com Last-Modified: Tue, 28 Oct 2008 14:36:45 GMT ETag: "314b26-5a-2d421940" Accept-Ranges: bytes Content-Length: 90 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: text/html Any clue as to why this could be happening?</code>
Web Design | | danatanseo0 -
Why is this page removed from Google & Bing indices?
This page has been removed from indices at Bing and Google, and I can't figure out why. http://www.pingg.com/occasion/weddings This page used to be in those indices There are plenty of internal links to it The rest of the site is fine It's not blocked by meta robots, robots.txt or canonical URL There's nothing else to suggest that the page is being penalized
Web Design | | Ehren0 -
Content position on page
I am in a limo service industry where people are not looking for great content or product description, all they want is a nice Lincoln Town car and a competitive price. Because I need to get more pictures in front of my customers rather than more content I am not sure if by not having the content high up in the page will affect my rankings. We are transitioning to a new template where we have more control over the layout of the website but because of the slider that we have on the homepage the content needs to go further down. We could insert some content in each of the slides but the page would start looking too "busy". We want the customers to see very clearly what we offer. They see the picture, click for more info and book the service. How important still is to have your keywords in the first hundred words on a certain webpage? Can we get away with having the content read by search engines after 3 - 4 slides and their description (about 20 words total) ?
Web Design | | echo10 -
Crawl Budget vs Canonical
Got a debate raging here and I figured I'd ask for opinions. We have our websites structured as site/category/product This is fine for URL keywords, etc. We also use this for breadcrumbs. The problem is that we have multiple categories into which a category fits. So "product" could also be at site/cat1/product
Web Design | | Highland
site/cat2/product
site/cat3/product Obviously this produces duplicate content. There's no reason why it couldn't live under 1 URL but it would take some time and effort to do so (time we don't necessarily have). As such, we're applying the canonical band-aid and calling it good. My problem is that I think this will still kill our crawl budget (this is not an insignificant number of pages we're talking about). In some cases the duplicate pages are bloating a site by 500%. So what say you all? Do we just simply do canonical and call it good or do we need to take into account the crawl budget and actually remove the duplicate pages. Or am I totally off base and canonical solves the crawl budget issue as well?0