What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content & Title Tag Group Fields on MoZ Report
Hello, On my SEO MOZ exported Site Crawl CSV report, I have columns for Duplicate Content Group & for Duplicate Title Tag Group. The values in the columns are numerical - 20, 5 , 15, etc. Can anyone explain to me what these values represent and how I can fix the issues I presume they represent? Thank you,
Moz Bar | | AED-1
Scott0 -
Why isn't the Moz crawler getting all of my item pages?
I am stumped and Moz is being terrible to work with. This site has about 40k pages 39,800 of them are item pages roughly. Moz is only finding about 2400 of my pages. So they are missing most but not all of my item pages. I do not know which item pages they are missing. The fact that they are finding about 2k but not the rest leads me to believe the crawler is struggling with pagination. The site is built on Magento 2 and uses the Amasty Layered Navigation extension. Does anyone have any ideas?
Moz Bar | | Tylerj0 -
Many Duplicate Content Flags
Not sure about you all, but I’m loving the new Moz Site Crawler. However, I was noticing that it is identifying a huge amount of pages as duplicate content. There are about 30,000 pages in this website, with that said we’ve had to make many templates to make the site scalable. Additionally a url rule was lost which caused a significant amount of duplicate pages to be created. I am working through using the moz crawl tool to identify duplicate pages but noticing many pages under “Affected Pages,” are actually unique content pages with initial content that is duplicate. I read that Moz flags any pages with 90% or more content overlapping content or code. My theory for this is that some templates that are too similar, to the point that Moz reads them as duplicative. Has this happened for anyone else? In addition, if Moz is flagging these similar pages as duplicate content, do we surmise that Google bots are having the same issue? We have seen issues with rankings as it pertains to the actual duplicate pages but hadn't experienced issues across the unique pages, they are hyperlocal pages so we are able to see rankings quite easily.
Moz Bar | | HZseo0 -
Error in Duplicate Content Being Reported - Pages Aren't Actually Duplicates
The recent crawl of one of our sites revealed a high number of duplicate content issues. However, when I viewed the report for pages with duplicate content I noticed almost all of them are not duplicates. For example, these two pages are marked as dupes:
Moz Bar | | M_D_Golden_Peak
https://www.writersstore.com/publishers/hollywood-creative-directory
https://www.writersstore.com/authors/g-miki-hayden These are thin as far as content goes but definitely not duplicates. Any recommendations or ways to adjust the settings so that these false positives aren't clogging up our site crawl report?0 -
MOZ crawler has been finding a lot of 803 and 804 errors
During last 3 weeks MOZ crawler has been finding a lot of 803 and 804 errors. Meanwhile all pages seem to be working fine. What could cause it?
Moz Bar | | Paruyr0 -
Perplexed by last MOZ crawling duplicate content errors
In the last crawler issues report from MOZ I can see many many pages listed as duplicate content with 0 duplicate urls. Like this: http://imgur.com/fbikRVq I am puzzled, what does it mean?
Moz Bar | | max.favilli0 -
Can Moz use canconical links to prevent notices about duplicate content issues?
if so how do we enable this - we've an average size site with a few hundred products but they appear in multiple categories, canonical url points to it's primary category (but a new page exists for each section... so for /cat-a/abc there will be another page cat-b/abc and again but the canonical points to cat-a always for that product) basically I see this kind of duplication error / notice as a false positive... help me
Moz Bar | | SEOAndy0 -
Dupe content report showing in 'Errors' section when surely should be in 'Warnings' section ?
Why is the dupe content info showing in errors and not warnings ? Since if dupe content can get your site penalised (as per Panda) or worse banned, surely it should be in that section of reports ? Cheers
Moz Bar | | Dan-Lawrence
Dan0