Duplicate Content - Bulk analysis tool?
-
Hi
I wondered if there's a tool to analyse duplicate content - within your own site or on external sites, but that you can upload the URL's you want to check in bulk?
I used Copyscape a while ago, but don't remember this having a bulk feature?
Thank you!
-
Great thank you!
I'll give both a go!
-
Great thanks
Yes I use screaming frog for this, but it was to look at actual page content. So yes to see if sites copy our content, but also to see whether we need to update our product content as some products are very similar.
I'll check the batch process on copyscape thanks!
-
I have not used this tool in this way, but have used it for other crawler projects related to content clean up and it is rock solid. They have been very responsive to me on questions related to use of the software. http://urlprofiler.com/
Duplicate content search is the project next on my list, here is how they do it.
http://urlprofiler.com/blog/duplicate-content-checker/
You let URL profiler crawl the section of your site that is most likely to be copied (say your blog) and you tell URL profiler what section of your HTML to compare against (i.e. the content section vs the header or footer). URL profiler then uses proxies (you have to buy the proxies) to perform Google searches on sentences from your content. It crawls those results to see if there is a site in the Google SERPs that has sentences from your content word for word (or pretty close).
I have played with Copyscape, but my markets are too niche for it to work for me. The logic here from URL profilers is that you are searching the database that most matters, Google.
Good luck!
-
I believe you might be able to use List Mode in ScreamingFrog to accomplish this, however it depends on ultimately what your goal is to check for duplicate content. Do you simply want to find duplicate titles or duplicate descriptions? Or do you want to find pages with sufficiently similar text as to warrant concern?
== Ooops! ==
It didn't occur to me that you were more interested in duplicate content caused by other sites copying your content rather than duplicate content among your list of URLs.
Copyscape does have a "Batch Process" tool but it is only available to paid subscribers. It does work quite nicely though.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate page content
These two URLs are being flagged as 98% similar in the code. We're a large ecommerce site, and while it would be ideal to have unique product descriptions on each page we currently don't have the bandwith. Thoughts on what else might be triggering this duplicate content? https://www.etundra.com/restaurant-parts/cooking-equipment-parts/fryers/scoops-skimmers/fmp-175-1081-fryer-crumb-scoop/ https://www.etundra.com/restaurant-equipment/concession-equipment/condiment-pumps/tablecraft-664-wide-mouth-condiment-pump/ Thanks, Natalie
On-Page Optimization | | eTundra0 -
Duplicate Content for Event Pages
Hi Folks, I have event pages for specific training courses running on certain dates, the problem I have is that MOZ indicates that I have 1040 duplicate content issues because I'm serving pages like this https://purplegriffon.com/event/2521/mop-practitioner I'm not sure how best to go about resolving this as, of course, although each event is unique in terms of it's start date, the courses and locations could be identical. Will Google penalise us for these types of pages, or will they even index them? Should I add a canonical link to the head of the document pointing to the related course page such as https://purplegriffon.com/courses/project-management/mop-management-of-portfolios/mop-practitioner. Will this solve the issue? I'm a little stuck on what to do for the best. Any advice would be much appreciated. Thanks. Kind Regards Gareth Daine
On-Page Optimization | | PurpleGriffon0 -
Gallery system creates duplicates
Hi, Does anybody know what can I do with those “duplicate content pages”? 1/ home page shows 4 different urls with different parameters. Should I use meta-robots tag to eliminate it? Or block it in robots.txt? http://screencast.com/t/xqNiowCYBwgh 2/ Also, there are dozens of duplicates created by the “gallery system”. Like this: http://screencast.com/t/qTq4YERG All showing for the same url. There are multiple pages for each location. Some people told me that it's irrelevant for rankings anyway. I suggested getting rid of flash website alltogether and getting a smooth wordpress installation, but it's not an option. Can you please help me with it? Best Regards, JJ
On-Page Optimization | | jjtech0 -
Duplicate Page Title issues
Hello, I have a duplicate page title problem: Crawl Diagnostics Reported that my website got **sample URLs with this Duplicate Page Title **between:
On-Page Optimization | | JohnHuynh
http://www.vietnamvisacorp.com/faqs.html and these URLs below:http://www.vietnamvisacorp.com/faqs/page-2
http://www.vietnamvisacorp.com/faqs/page-3
http://www.vietnamvisacorp.com/faqs/page-4
http://www.vietnamvisacorp.com/faqs/page-5 I don't know why, because I have already implemented rel=”next” and rel=”prev” to canonical pages. Please give me an advice!0 -
Mass Duplicate Content
Hi guys Now that the full crawl is complete I've found the following: http://www.trespass.co.uk/mens-onslow-02022 http://www.trespass.co.uk/mens-moora-01816 http://www.trespass.co.uk/site/writeReview?ProductID=1816 http://www.trespass.co.uk/site/writeReview?ProductID=2022 The first 2 duplicate content is easily fixed by writing better product descriptions for each product (a lot of hours needed) but still an easy fix. The last 2 are review pages for each product which are all the same except for the main h1 text. My thinking is to add no index and no follow to all of these review pages? The site will be changing to magento very soon and theres still a lot of work to do. If anyone has any other suggestions or can spot any other issues, its appreciated. Kind regards Robert
On-Page Optimization | | yournetbiz1 -
Are these considered duplicates?
http://www.domain.com/blog/sample-blog-post/#more-0001 http://www.domain.com/blog/sample-blog-post/ The first URL is coming from a "click here" hyperlink from the excerpt of the 2nd URL in my homepage. Thanks in advance!
On-Page Optimization | | esiow20130 -
Crawl Diagnostics - Duplicate Content and Duplicate Page Title Errors
I am getting a lot of duplicate content and duplicate page title errors from my crawl analysis. I using volusion and it looks like the photo gallery is causing the duplicate content errors. both are sitting at 231, this shows I have done something wrong... Example URL: Duplicate Page Content http://www.racquetsource.com/PhotoGallery.asp?ProductCode=001.KA601 Duplicate Page Title http://www.racquetsource.com/PhotoGallery.asp?ProductCode=001.KA601 Would anyone know how to properly disallow this? Would this be as simple as a robots.txt entry or something a little more involved within volusion? Any help is appreicated. Cheers Geoff B. (a.k.a) newbie.
On-Page Optimization | | GeoffBatterham0 -
What is the best solution for printable product pages (duplicate content)?
What do you think is the best solution for preventing duplicate content issues on printable versions of product pages? The printable versions are identical in content. Disallow in Robots.txt? Meta Robots No Index, Follow? Meta Robots No Index No Follow? Rel Canonical?
On-Page Optimization | | BlinkWeb1