What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why isn't the Moz crawler getting all of my item pages?
I am stumped and Moz is being terrible to work with. This site has about 40k pages 39,800 of them are item pages roughly. Moz is only finding about 2400 of my pages. So they are missing most but not all of my item pages. I do not know which item pages they are missing. The fact that they are finding about 2k but not the rest leads me to believe the crawler is struggling with pagination. The site is built on Magento 2 and uses the Amasty Layered Navigation extension. Does anyone have any ideas?
Moz Bar | | Tylerj0 -
Duplicate content found in scan
On June 8th we ran a Moz Crawl on our site. We found 144 pages that were flagged with duplicate content.
Moz Bar | | StickyLife
Again on June 13th we ran another moz crawl on our site and found 137 pages that were flagged with duplicate content. Then one final scan on June 22nd with 161 pages of duplicate content. After comparing the 3 different scans I see that, without making any changes, pages that were not flagged as duplicate content are now being flagged as duplicate content. While at the same time, pages that were originally flagged as duplicate content are now no longer showing up with duplicate content. I could understand if we made some changes to these pages but no changes were made. For example: On the 8th this page was flagged as duplicate content - https://www.stickylife.com/star-magnet
On the 13th and 22nd it was not flagged as duplicate content but no changes were made to that page. For reference it was flagged as duplicate content with the following page: https://www.stickylife.com/baseball-glove-magnet This page was also Not changed or altered between between these dates. In addition, when Moz scans our site through our campaign every Friday the results do not match what we see when we do a manual scan. Moz's weekly scan only reveals 14 pages with duplicate content as opposed to the numbers you see above. Why such inconsistencies in the Moz Scans?0 -
Perplexed by last MOZ crawling duplicate content errors
In the last crawler issues report from MOZ I can see many many pages listed as duplicate content with 0 duplicate urls. Like this: http://imgur.com/fbikRVq I am puzzled, what does it mean?
Moz Bar | | max.favilli0 -
Moz Crawler not Identifying all Duplicate Pages
On two recent site crawls (9/27/14 and 11/4/14) for duplicate content the Moz tool did not ID the following 2 pages, which are 100% duplicate to each other: http://www.hooksandlattice.com/planter-hampton-241212.html ; Screenshot: http://screencast.com/t/DdwWroUU http://www.hooksandlattice.com/planter-hampton-721212.html ; Screenshot: http://screencast.com/t/8Lb1cJZmGrhX As I'm working feverishly to re-write and update the site (goal is ZERO duplicates) I'm finding it challenging to use the Moz tool to get the project done. Does anyone have any feedback or help they can provide for how I can identify all duplicate pages associated with my domain? Thank you! Lindsey Pfeiffer
Moz Bar | | CMC-SD0 -
Weekly Custom Reports Send Duplicates
For one of my sites, I have set up a weekly custom report to be sent out, but when the report comes in, there are multiple copies of the report. Any help would be appreciated on how to make sure that only one copy of the report is sent.
Moz Bar | | Wharthog0 -
Duplicate Page Title query in the PRO Campaign tool
Can someone help me on this. I am seeing duplicate page titles on the PRO Campaign Crawl tool on an ecommerce site for example MOZ is saying that these two pages have a duplicate page title: http://www.cheapsnapframes.co.uk/colour-25mm-snap-frames/25mm-green-snap-frame/a0-traffic-green-snap-frame-25mm/prod_1730.html http://www.cheapsnapframes.co.uk/snap-picture-poster-frames/colour-25mm-snap-frames/green-25mm-snap-frame/a0-traffic-green-snap-frame-25mm/prod_1730.html They are the the same product in two categories. When I view the source of both pages the this link is the same in the meta: <link rel="<a class="attribute-value">canonical</a>" href="[http://www.cheapsnapframes.co.uk/colour-25mm-snap-frames/25mm-green-snap-frame/a0-traffic-green-snap-frame-25mm/prod_1730.html](view-source:http://www.cheapsnapframes.co.uk/colour-25mm-snap-frames/25mm-green-snap-frame/a0-traffic-green-snap-frame-25mm/prod_1730.html)" /> So is there something else I need to have done to erradicate this or is it not an issue? Thanks in advance Tracy
Moz Bar | | dashesndots0 -
Screaming Frog, Moz and other crawlers
Hi Ignorant question, but is it possible to use Screaming Frog or the Moz crawler or any other reputable crawler for a site still in development i.e. it is yet to be indexed? If so, could someone provide some quick instructions on how this can be done. Thanks in advance for any support. Neil
Moz Bar | | mccormackmorrison0