How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ranking Fluctuation on "Canvas Prints" keyword in google.co.uk
Hello Moz We are struggling for "canvas prints" ranking in google.co.uk since last 2 years. every time in SERP my webpage has been changed. i want to rank this URL on this particular keyword - "canvas prints" Can you tell me why my page has been fluctuate every time in SERP's. mtwpvf
White Hat / Black Hat SEO | | CommercePundit1 -
Does Google and Other Search Engine crawl meta tags if we call it using react .js ?
We have a site which is having only one url and all other pages are its components. not different pages. Whichever pages we click it will open show that with react .js . Meta title and meta description also will change accordingly. Will it be good or bad for SEO for using this "react .js" ? Website: http://www.mantistechnologies.com/
White Hat / Black Hat SEO | | RobinJA0 -
Google Answer Box Optimization?
Anyone have any luck in optimizing your site to show up in the Google Answer Boxes that popup for informational queries? (for example: "what is seo?") I've read many of the articles that have been written on the subject, and have been able to show up for many queries by a) ranking high organically, b) placing the question at the top of the page, and then answering it succinctly. However, for one term a competitor continues to show up in the answer box instead of us, despite their site ranking lower organically in the search results. Anyone have any experience/advice for replacing a competitor in the Answer Box? Thanks!
White Hat / Black Hat SEO | | TakeshiYoung2 -
Content website of the year 2009 ....
I own a network of travel sites, after all the changes that happened to past 12 months and so. I am really thinking if maybe my sites are worthless. I mean, let's be honest here. I understand what Google is doing. So i ask myself. If I wasn't trying to make a living with google adsense and affiliate sites... Would I still have these travel sites ? well the truth is NO NO... Therefore should i forget about my content site ? It is a punch of useless content. well some interesting information but it is a travel guide like many others online. What do you think? now it is better to focus on your product site or create 1 good websites rather than a network of sites that worked very veryyy well the past 10 years...
White Hat / Black Hat SEO | | sandyallain0 -
How do I know what links are bad enough for the Google disavow tool?
I am currently working for a client who's back link profile is questionable. The issue I am having is, does Google feel the same way about them as I do? We have no current warnings but have had one in the past for "unnatural inbound links". We removed the links that we felt were being referred to and have not received any further warnings, nor have we noticed any significant drop in traffic or rankings at any point. My concern is that if I work towards getting the more ominous looking links removed (directories, reciprocal links from irrelevant sites etc.), either manually or with the disavow tool, how can I be sure that I am not removing links that are in fact helping our campaign? Are we likely to suffer from the next Penguin update if we chose to proceed without moving the aforementioned links? or is Google only likely to target the serious black hat links (link farms etc.)? Any thoughts or experiences would be greatly appreciated.
White Hat / Black Hat SEO | | BallyhooLtd0 -
Someone COPIED my entire site on Google- what should I do?
I purchased a very high ranked and old site a year or so ago. Now it appears that the people I purchased from completely copied the site all graphics and content. They have now built that site up high in rankings and I dont want it to compromise my site. These sites look like mirror images of each other What can I do?
White Hat / Black Hat SEO | | TBKO0 -
Redirecting doesn't rank on google
We are redirecting our artist's official website to copenhagenbeta.dk. We have two artists (Nik & Jay and Burhan G) that top ranks on Google (first on page 1), but one of them (Lukas Graham) doesn't rank at all. We use the same procedure with all artists. http://copenhagenbeta.dk/index.php?option=com_artistdetail&task=biography&type=overview&id=49 Doesn't rank but the old artist page still does. Is it the old page that tricks Google to think that this is the active page for the artist?
White Hat / Black Hat SEO | | Morten_Hjort0 -
Why is Google not punishing paid links as it says it will?
I've recently started working with a travel company - and finding the general link building side of the business quite difficult. I had a call from an SEO firm the other day offering their services, and stating that they had worked with a competitor of ours and delivered some very good results. I checked the competitors rankings, PR, link profile, and indeed, the results were quite impressive. However, the link profile pointed to one thing, that was incredibly obvious. They had purchased a large amount of sidebar text links from powerful blogs in the travel sector. Its painfully obvious what has happened, yet they still rank very highly for a lot of key terms. Why don't Google do something about this? They aren't the only company in this sector doing this, but it just seems pointless for white hats trying to do things properly, then those with the dollar in their pockets just buy success in the SERPS. Thanks
White Hat / Black Hat SEO | | neilpage1230