Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Crawl Shows Duplicate Content Which Doesn't Seem To Appear In Google?
Morning All, First post, be gentle! So I had Moz crawl our website with 2500 high priority issues of duplicate content, not good. However if I just do a simple site:www.myurl.com in Google, I cannot see these duplicate pages....very odd. Here is an example....
On-Page Optimization | | scottiedog
http://goo.gl/GXTE0I
http://goo.gl/dcAqdU So the same page has a different URL, Moz brings this up as an issue, I would agree with that. However if I google both URL's in Google, they will both bring up the same page but with the original URL of http://goo.gl/zDzI7j ...in other words, two different URL's bring up the same indexed page in Google....weird I thought about using a wildcard in the robots.txt to disallow these duplicate pages with poor URL's....something like.... Disallow: /*display.php?product_id However, I read various posts that it might not help our issues? Don't want to make things worse. On another note, my colleague paid for a "SEO service" and they just dumped 1000's of back-links to our website, of course that's come back to bite us in the behind. Anyone have any recommendations for a good service to remove these back-links? Thanks in advance!!0 -
Many have stolen our content. Rewrite vs. DMCA content removal?
Hello, We own a medical tourism website and many other sites have stolen (copied and pasted) our content. Our content is more than 2 years old, so we thought we could rewrite the content - but Which is a more wiser decision from you guys' experience? Archive our current content at a different URL and upload a fresh content in the current URL Claim our originality to Google and ask the stolen sites to remove our content. Thank you and appreciate your time.
On-Page Optimization | | joony0 -
Duplicate Content
I have a question about duplicate content. (auto generated text).
On-Page Optimization | | affigroup
Will google consider page 1 and page 2 as duplicate content? Page 1. You will find all the Amazon coupon codes and Amazon discount codes currently available listed below, if Amazon doesn't currently have any coupons available you may want to check for Amazon deals or find related coupon codes or promotional codes for similar online stores selling the same products as amazon.
We always have the latest coupon codes for Amazon which are updated daily, so if you can't find any Amazon coupons here then you won't find them anywhere else.
Shop online today at Amazon, and take advantage of the coupon codes that Amazon currently has on offer, these coupon codes, offer codes, and promo codes for Amazon may never be available again. Page 2. You will find all the Target coupon codes and Target discount codes currently available listed below, if Target doesn't currently have any coupons available you may want to check for Target deals or find related coupon codes or promotional codes for similar online stores selling the same products as Target.
We always have the latest coupon codes for Target which are updated daily, so if you can't find any Target coupons here then you won't find them anywhere else.
Shop online today at Target, and take advantage of the coupon codes that Target currently has on offer, these coupon codes, offer codes, and promo codes for Target may never be available again.0 -
Why Moz is showing Duplicate Page Content Issues?
We have a Career Section on our website. For each job post, there is a separate link of "Apply Job". Now Moz's Crawl Diagnostic is showing Duplicate page content for such URLs. Here are two such URLs: http://tiny.cc/em9nyw http://tiny.cc/bq9nyw Can any one please suggest on this? Thanks
On-Page Optimization | | chandman0 -
Duplicate Page Title
Wordpress Category pagination causes duplicate page title errors (ie. when there are so many posts in the category, it paginates them), is this a problem? Your tool is reporting it as a problem... but ProPhoto (my Wordpress provider say it is not a problem). Here are the 2 URL's with the same page title: http://www.lisagillphotography.co.uk/category/child-photography/ http://www.lisagillphotography.co.uk/category/child-photography/page/2/
On-Page Optimization | | LisaGill0 -
Is is it true that Google will not penalize duplicated content found in UL and LI tags?
I've read in a few places now that if you absolutely have to use a key term several times in a piece of copy, then it is preferable to use li and ul tags, as google will not penalise excessive density of keywords found in these tags. Does anyone know if there is any truth in this?
On-Page Optimization | | jdjamie0 -
Duplicate Content
Hi I have Duplicate content that i do sent understand 1 - www.example.dk 2- www.example.dk/ I thought i was the same page, whit and without the / Hope someone can help 🙂
On-Page Optimization | | seopeter290 -
Numbers above actual site content
Most pages on my website contain many numbers above the actual text on the page. This is useful for users and looks good on an actual view of the page. However, when a bot reads the page it appears as rows of numbers with a few sentences at the bottom of the page. Does having these number have a negative SEO effect? If so, should I change them to something such as an image so they aren't readable by search engines?
On-Page Optimization | | theLotter0