Sorting Dupe Content Pages
-
Hi,
I'm no excel pro, and I'm having a bit of a challenge interpreting the Crawl Diagnostics export .csv file.
I'd like to see at a glance which of my pages (and I have many) are the worst offenders for dupe content – ie. which have the most "Other URLs" associated with them.
Thanks, would appreciate any advice on how other people are using this data, and/or how 'Moz recommends to do it.
-
CMC is correct - thats how I do it for larger sites.
- delete all columns except the URL column (col A) and the duplicate pages column (now Col B)
- in cell C2, enter this formula: =len(b2) it will calculate the characters in dupe pages cell
- drag that cell down to last row
- select all three columns and sort col c by largest to smallest
Obviously this isn't going to give you an exact number of dupe pages since URL text strings can vary in length, but it does give you a pretty good idea of the worst offenders....
-
I've found this a little frustrating, too. The display on the web will show the number of duplicate URLs, but the exported spreadsheet does not. It does, however, list all of the duplicate URLs in one cell -- so you could calculate the character length of that cell and then sort by that column, and that would give you a rough ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why would someone go to same 404 page over and over?
Good morning, I've been using the redirection plugin on my wordpress site and noticed i have multiple IP addresses going to the same folder on my site - like "mydomain.com/folder-name/". The "folder-name" is obviously not anything remotely like any folder or file name I have on my domain - so it's obviously spammy in nature. And, there are multiple IP addresses going to this same URL address every 3 hours on the dot, so it's appears automated. Is this something to be concerned about? Should I "do" anything? Thanks in advance for reading and replying!
Moz Pro | | mlm120 -
Problem with On-page
I have an issue. I have added 5 keywords but when i go to the "on page" tab. They are not there... So i press on "Add keyword" and it takes me to another page where i can see all my keywords. So i go back to the "on page" and no keyword shows up. I wanna have a summary of the weekly crawl for the on page of these keywords and it's not showing up 😞 Anybody knows why?
Moz Pro | | theseolab0 -
Test page ranking in search engine
Is there a tool on SEOMOZ that allows me to get search engine ranking(s) for a specific page with a specific set of keywords?
Moz Pro | | Webxtrakt0 -
Truncate page URLs
We have some pages (for example a contact us form) for which the URL is modified by the CMS depending on the referring page (this helps to put the form submission in context for the sales reps who get the contact submission). The SEOmoz crawler considers each URL a new page -- and so numbers like in diagnostics are all inflated as the same page is listed multiple times (e.g. for too many links) Is there a setting to change what the crawler considers to be the same page? Here are two URLs for the same page that the reports treat as separate pages: http://www.spirent.com/About-Us/Contact_us.aspx?referurl=0F528F4D703D8BB3523738D6373AA8AD http://www.spirent.com/About-Us/Contact_us.aspx?referurl=10ACDA6055244E369395223437FDCF30 The page is actually: http://www.spirent.com/About-Us/Contact_us.aspx Thanks Ken
Moz Pro | | spirent.marcom0 -
I have another Duplicate page content Question to ask.Why does my blog tags come up as duplicates when my page gets crawled,how do I fix it?
I have a blog linked to my web page.& when rogerbot crawls my website it considers tags for my blog pages duplicate content.is there any way I can fix this? Thanks for your advice.
Moz Pro | | PCTechGuy20120 -
How to Fix the Errors with Duplicate Title or Content?
The latest Crawl Diagnostic has found 160 Errors on my site.
Moz Pro | | hanmark
And my error is, that the same content or title is used on two different! pages:
on both my root domain (han-mark.com) and the www subdomain. What does it matter (with or without www)? How serious is that error? Do I need to fix all the errors (and hundreds of warnings too)? What's the best practice? Is there any Guide on how to do it
or Tools for doing it the fast way? Viggo Joergensen0 -
About Duplicate Content found by SEOMOZ... that is not duplicate
Hi folks, I am hunting for duplicate content based on SEOMOZ great tool for that 🙂 I have some pages that are mentioned as duplicate but I cant say why. They are video page. The content is minimalistic so I guess it might be because all the navigation is the same but for instance http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Thierry-Delprat-CTO and http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Cheryl-McKinnon-CMO are mentioned as duplicate. Any idea? Is it hurting? Cheers,
Moz Pro | | nuxeo0 -
Duplicate page error from SEOmoz
SEOmoz's Crawl Diagnostics is complaining about a duplicate page error. I'm trying to use a rel=canonical but maybe I'm not doing it right. This page is the original, definitive version of the content: https://www.borntosell.com/covered-call-newsletter/sent-2011-10-01 This page is an alias that points to it (each month the alias is changed to point to the then current issue): https://www.borntosell.com/covered-call-newsletter/latest-issue The alias page above contains this tag (which is also updated each month when a new issue comes out) in the section: Is that not correct? Is the https (vs http) messing something up? Thanks!
Moz Pro | | scanlin0