What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do I do with content suggestions to help you rank higher?
I am looking at the Page Optimization on a post. Under content suggestions to help you rank higher - Are they recommending that I use the recommend anchor text and use one of the top ranking websites to link to those?
Moz Bar | | RoniFaida1 -
Trying to duplicate Screaming Frog report
I am trying to use the crawl report to generate the Screaming Frog report shown below. I want to use it to calculate the internal page rank as Paul Shapiro outlined in this article. Google is not seeing my site the way I want them to and I want to work on the site heirarchy and structure. I am not seeing where: Type, Anchor, and Alt are listed in the crawl report - is it called something else? Is there another report I can run for the whole site to generate the data above? Does the Moz PA already calculate the internal PR of pages? Any additional resources or people that could help me with this? Thanks, SAM
Moz Bar | | SammyT0 -
Error 406 with crawler test
hi to all. I have a big problem with the crawler of moz on this website: www.edilflagiello.it. On july with the old version i have no problem and the crawler give me a csv report with all the url but after we changed the new magento theme and restyled the old version, each time i use the crawler, i receive a csv file with this error: "error 406" Can you help me to understan wich is the problem? I already have disabled .htacces and robots.txt but nothing. Website is working well, i have used also screaming frog as well.
Moz Bar | | ArchieDonnithorne0 -
I update content and then craw but the MOZ spider still shows old content. Do I need to update something else?
"This shows but was replaced a day before I ran Moz crawer: | We provide a full service for low cost automated phone calls, robocalls, Bulk SMS service, Political robo calls without needing computer skills | "
Moz Bar | | ThomasDaBomb
I look in the link on website and see:
<title>Our customers talk about: Currently the tremendous growth of organi</title> Why does the craw not reflect the current content? Thanks.
Thomas0 -
Perplexed by last MOZ crawling duplicate content errors
In the last crawler issues report from MOZ I can see many many pages listed as duplicate content with 0 duplicate urls. Like this: http://imgur.com/fbikRVq I am puzzled, what does it mean?
Moz Bar | | max.favilli0 -
Duplicate Page and Title Issues
On the last crawl, we received errors for duplicate page titles and some duplicate content pages. Here is the issue: We went through our page titles that were marked as duplicate and changed them to make sure their titles were different. However, we just received a new crawl this week and it is saying there are even more duplicate page title errors detected than before. We're wondering if this is a problem with just us or if it has been happening to other Moz users. As for the duplicate content pages, what is the best way to approach this and see what content is being looked at as a "duplicate" set?
Moz Bar | | Essential-Pest0 -
Moz Crawler not Identifying all Duplicate Pages
On two recent site crawls (9/27/14 and 11/4/14) for duplicate content the Moz tool did not ID the following 2 pages, which are 100% duplicate to each other: http://www.hooksandlattice.com/planter-hampton-241212.html ; Screenshot: http://screencast.com/t/DdwWroUU http://www.hooksandlattice.com/planter-hampton-721212.html ; Screenshot: http://screencast.com/t/8Lb1cJZmGrhX As I'm working feverishly to re-write and update the site (goal is ZERO duplicates) I'm finding it challenging to use the Moz tool to get the project done. Does anyone have any feedback or help they can provide for how I can identify all duplicate pages associated with my domain? Thank you! Lindsey Pfeiffer
Moz Bar | | CMC-SD0 -
Moz Crawl Showing Duplicate Content But It's Not?!
Unfortunately I can't give out the URL, but here's the deal... I have two URL's which have completely different content on them but are being crawled as duplicate content. Any Idea how that would happen? I'm not seeing any errors in WMT's. Has anyone seen this before? Is the duplicate content reporting based on a % of the page content matching as the same?
Moz Bar | | Swarm-SEO0