Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with repetitive content
Hi, I recently took over a site from another SEO firm. They created lots of articles targeting the same terms. The articles aren't bad but I fear they could dilute the site's ranking power for a given term. I don't want to give away the specific industry, but let's say they have eight pages targeting the term "______ billing software." I'd rather focus their resources on ranking one page for that term. Does that make sense? And if so, how do I do that? The company has a writer that can see if any of the content is good enough to add to their primary ______ billing software page. Would you 301 redirect all these pages to the one you want to rank, or would you canonicalize them? Or am I way off base in my thinking?
On-Page Optimization | | rich.owings0 -
Form Only Pages Considered No Content/Duplicate Pages
We have a lot of WordPress sites with pages that contain only a form. The header, sidebar and footer content is the same as what's one other pages throughout the site. Each form page has a unique page title, meta description, form title and questions but the form title, description and questions add up to probably less than 100 words. Are these form pages negatively affecting the rankings of our landing pages or being viewed as duplicate or no content pages?
On-Page Optimization | | projectassistant0 -
I have a lot of internal duplicate content as intros to a series of articles, is this bad?
On a site that I'm working on there is a series of posts with the same beginning to their titles. All of the titles start with Christ's Church ("Mormons"): And then about the first four paragraphs of all these posts is exactly the same, it is just explaining this series of posts. I'll link to a couple of examples so you know what I'm talking about. I know there are several other problems with these posts/site 🙂 but I am specifically curious about the partial duplicate title and the first few paragraphs being duplicate. http://www.mormonchurch.com/3259/christs-church-mormons-helping-out-a-friend http://www.mormonchurch.com/2969/christs-church-mormon-happiness-is-found-only-through-christ There are about 30 posts similar to these. Thank you, I look forward to your responses.
On-Page Optimization | | ThridHour1 -
Static content VS Dynamic changing content
We have collected a lot of reviews and we want to use them on our Categories pages. We are going to be updating the top 6 reviews per categories every 4 days. There will be another page to see all of the reviews. Is there any advantage to have the reviews static for 1 or 2 weeks vs. having unique new ones pulled from the data base every time the page is refreshed? We know there is an advantage if we keep them on the page forever with long tail; however, we have created a new page with all of the reviews they can go to.
On-Page Optimization | | DoRM0 -
Duplicate Content - Potential Issue.
Hello, here we go again, If I write an article somewhere, lets say Squidoo for instance, then post it to my blog on my website will google see this as duplicate content and probably credit Squidoo for it or is there soemthing I can do to prevent this, maybe a linkk back to Squidoo from my website or a dontfollow on my website? Im not sure so any help here would be great, Also If I use other peoples material in my blog and link back to them, obviously I dont want the credit for the original material I am simply collating some of this on my blog for others to have a specific library if you like. Is this going to damage my websites reputation? Thanks again peeps. Craig Fenton IT
On-Page Optimization | | craigyboy0 -
Duplicate Product BUT Unique Content -- any issues?
We have the situation where a group of products fit into 2 different categories and also serve different purposes (to the customer). Essentially, we want to have the same product duplicated on the site, but with unique content and it would even have a slightly different product name. Some specifications would be redundant, but the core content would be different. Any issues?
On-Page Optimization | | SEOPA1 -
Numbers above actual site content
Most pages on my website contain many numbers above the actual text on the page. This is useful for users and looks good on an actual view of the page. However, when a bot reads the page it appears as rows of numbers with a few sentences at the bottom of the page. Does having these number have a negative SEO effect? If so, should I change them to something such as an image so they aren't readable by search engines?
On-Page Optimization | | theLotter0 -
How to avoid duplicate content on ecommerce pages?
I am currently building the site architecture for a very large ecommerce site. I am wondering how I should build it out if I have products that I want to include in multiple categories within my site. For example: Lets say I sell fitness equipment and I have categories for things such as: Treadmill, Exercise Bike, Stair Stepper, Weight Benches etc. But then I also have specific brand category pages such a: Precor, Life Fitness, Hammer, Body Solid So my question is how do I structure this so I am building this correctly? If I sell a Precor Treadmill I will want to include that product under the "Treadmill" category page as well as under the "Precor Equipment" category page. Can I get some advice for the best way to structure this? It's obviously something I want to avoid at all costs of doing improperly and having to fix later. Thank you Jake
On-Page Optimization | | PEnterprises0