Update in Moz spider/tools?? Flagging duplicate content / ignoring canonical
-
Hi all,
Has there been an update in the SEOmoz crawling software?
We now have thousands of dupe content/page title warnings for paginated product page URLs that have correctly formatted canonicals.
e.g.
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
... has following pages with identical content that have been flagged:
http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4
..plus 4 more URL's.
But they all have canonical set. There's even a notice at the bottom of report that tells us there's a canonical set to http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
What gives, SEOmoz ??
Thanks
Michael
-
Hey Lawrence,
Campaigns have a 95% tolerance for duplicate content. This includes all the source code on the page and not just the viewable text. So if a URL is at least 95% similar in code and content to another URL, this warning will appear.
You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php
We don't know what standard Google uses, but it's safe to say they are a bit more sophisticated than us - so you might be okay in this regard as long as you have a couple hundred words of unique text and some unique coding per page. Google won't say how much duplicate content is too much, so we like to be better safe than sorry.
I hope this help. Let me know if you need further assistance.
-Chiaryn
-
Hi Chiaryn,
Thanks for reply and explanation. The different colour-specific pages e.g. Tweed Green and Olive Green have some different content but it's nothing like enough in cases of two greens, two blues etc. as we simplify colour names for search so when there is an Olive and a Tweed Green they both end up having 'Green' as variable in page title, H1 etc. Will fix this.
Do you think the reviews at the bottom of the pages will also trigger dupe content warning? i.e. even if we make all other on-page elements unique for each colour url? (page title, H1, H2, prod description etc) The reviews are quite extensive and are the same on all the separate colour specific product page versions of each style and was thinking today whether we should remove them from these colour product pages (OR perhaps let the colour product pages have their OWN reviews)
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
Thanks again
-
Oh, brilliant (re: "See more" aspect) Thanks for the info. Will let you how we tackle this and the repercussions (!) and look forward to hearing how you get on also!
-
Hi Michael,
Thanks for writing in. I already emailed you in response to the ticket you sent in to the Help Desk, but I will copy my answer here for you review.
--
I looked into your campaign and it seems that this is happening because of where your canonical tags are pointing. These pages are considered duplicates because their canonical tags point to different URLs. For example, http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx is considered a duplicate of http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4 because the canonical tag for the first page is http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx while the canonical for the second URL ishttp://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx, with one URL showing tweed-green and the other showing olive-green.
Since the canonical tags point to different URLs it is assumed that http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx and http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicates
The examples you've provided actually fall into the fourth example I've listed above.I hope this clears things up. Please let me know if you have any other questions.
--
-Chiaryn
-
We use the "See more" script on our sites, and from what I understand, at least from other Mozzers, this is an okay practice. http://www.seomoz.org/q/using-more-info-javascript-toggledisplay-tag-for-more-info-text
We also use the rel="prev" and rel="next" to some success, but I can't comment on how that's functioning canonical-wise, because IT WAS DROPPED from our latest redesign and is going to be added to our client's website in the latest release. Oye.
I'd love to hear how this works out for you. There are some really great Mozzers on here with loads of experience about canonical tags and duplicate page issues. Can't wait to see what they have to contribute.
-
Hi there,
Thanks for your response.
It's not product page A being seen as a duplicate of product page B etc, but several versions of product A seen as duplicate due to pagination, stemming from reviews for the products that span several pages, so making the rest of the content, titles etc different other than the (crawlable) reviews isn't really an option.
Will look more into "noindex, follow" tags in pagination.
We could have a View All page for indexing showing all reviews (with lots of scrolling!) , with the paginated versions canonicalized to that version (could still serve the paginated version of product page from site navigation perhaps with "noindex, follow" meta tag) Text doesn’t take long to load and this approach would consolidate the review content.
http://googlewebmastercentral.blogspot.co.uk/2011/09/view-all-in-search-results.html
Other option is to use rel=”prev” and rel=”next” implementation which shows Google the relationship between the pages (not sure if it will still be flagged as dupe content in SEOmoz though! Depends if they follow the tag). This way individual pages might get indexed (not sure if that's a good thing?!) perhaps if there's something in a review from (say) page 5 of the product reviews.
http://googlewebmastercentral.blogspot.co.uk/2011/09/pagination-with-relnext-and-relprev.html
Ideally I'd like to implement all reviews on one page and hide them with a facebook-style 'See more' function. Not sure if that counts as hiding content? Will look into this.
-
Hi Michael,
Not sure if this helps you out at all, but I found this about the canonicals and SEOMoz crawl report in a previous Q http://mz.cm/11erRj6:
As far as the SEOmoz crawl reports go, not that setting a canonical won't stop these pages being reported as duplicate content.
From the help:
"Keep in mind that that canonicals will stop the pages from ranking against each other, but they will still show up as duplicate content from a UI perspective, so we will still count them as duplicate."
I have the same issues on my accounts. I'm focusing on making the pages content as unique as possible, or using the "noindex, follow" meta tags to see if that makes a difference.
I know you may have a lot of pages on your website, but perhaps writing short descriptions on your products would help. It might be worthwhile, but completely understandable that it may be a huge undertaking if you have hundreds or thousands of pages.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Site Content found in Moz; Have a URL Parameter set in Google Webmaster Tools
Hey, So on our site we have a Buyer's Guide that we made. Essentially it is a pop-up with a series of questions that then recommends a product. The parameter ?openguide=true can be used on any url on our site to pull this buyer's guide up. Somehow the Moz Site Crawl reported each one of our pages as duplicate content as it added this string (?openguide=true) to each page. We already have a URL Parameter set in Google Webmaster Tools as openguide ; however, I am now worried that google might be seeing this duplicate content as well. I have checked all of the pages with duplicate title tags in the Webmaster Tools to see if that could give me an answer as to whether it is detecting duplicate content. I did not find any duplicate title tag pages that were because of the openguide parameter. I am just wondering if anyone knows:
Moz Pro | | MitchellChapman
1. a way to check if google is seeing it as duplicate content
2. make sure that the parameter is set correctly in webmaster tools
3. or a better way to prevent the crawler from thinking this is duplicate content Any help is appreciated! Thanks, Mitchell Chapman
www.kontrolfreek.com0 -
Does Moz have any tools to see the amount of traffic certain keywords bring us in search? Does anyone know any tools that give the actual traffic numbers?
We're looking for numerical data on the amount of traffic that keywords receive, regardless of their rank in Moz. Thanks!
Moz Pro | | Scratch-Kony0 -
Where are my SEO Moz resources
I logged in, changed my password and now I can't find my SEO resources. I need to pull a report quickly, and can't find what I need. Please help ASAP!
Moz Pro | | bcbsm0 -
The pages that add robots as noindex will Crawl and marked as duplicate page content on seo moz ?
When we marked a page as noindex with robots like {<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">noindex</a>" />} will crawl and marked as duplicate page content(Its already a duplicate page content within the site. ie, Two links pointing to the same page).So we are mentioning both the links no need to index on SE.But after we made this and crawl reports have no change like it tooks the duplicate with noindex marked pages too. Please help to solve this problem.
Moz Pro | | trixmediainc0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
Excel tips or tricks for duplicate content madness?
Dearest SEO Friends, I'm working on a site that has over 2,400 instances of duplicate content (yikes!). I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why: Say we had three columns of duplicate content. The data is displayed thusly: | Column A | Column B | Column C URL A | URL B | URL C | In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening: | Column A | Column B | Column C URL A | URL B | URL C URL B | URL A | URL C URL C | URL A | URL B | Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately. This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal. Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness? Much appreciated, you Excel Gurus you!
Moz Pro | | FMLLC0 -
On page links tool here at Seomoz
Hi Seomoz - first of all, thanks for the best SEO tools I have ever worked with (this is my first question in this forum, and also I just subscribed as a paying customer after the 30 days trial you guys offer). My question: After having worked for several weeks on getting the numbers of links in our forum on www.texaspoker.dk down, we are somewhat surprised to see that we didn't succeed in getting lower numbers. For instance, this page: http://www.texaspoker.dk/forum/aktuelle-konkurrencer/coaching-projekt-bliver-du-den-udvalgte has (that's what Seomoz seo tool tells us): 239 on page links. Can this really be true? We can't find these links, and we actuually did a lot to lower the numbers of links, for instance the forum members picture was a link before, and also there was a "go to top" link in each post in the forum. Thanks a lot.
Moz Pro | | MPO0 -
Moz tool bar showing less links
Just checked our links for a couple of our sites and noticed that the number of inbound links has dropped from around 55,000 to 13,000 on one and from 6000 to 700 on the other. GWMT still showing the previous amounts. Anyone else experienced this over the last few days?
Moz Pro | | heatherrobinson0