Duplicate content check picking up weird urls
-
Hi everyone,
I love the duplicate content feature; we have a lot of duplicate content issues due to the way our site is structured. So, we're working on them. However, I'm not fully understanding the results. For example, say I have an article on breast cancer symptoms. It shows up as duplicate content, by having two urls that point to the exact same page. http://www.healthchoices.ca/articles/breast cancer symptoms and http://www.healthchoices.ca/somerandomstringofcode. I fully understand why that is duplicate content.
I am not sure about this though, it picks up the same url twice and calls it duplicate content. For example, saying that http://www.healthchoices.ca/dr.-so-and-so and http://www.healthchoices.ca/dr.-so-and-so is duplicate...however is this not the same page? Is there something I'm missing? Many of the URL's are identical.
Thanks,
Erin
-
Hi Erin -
Is that a Google Webmaster file?
Looking at those URLs in SERPS, it seems you have some content causing duplicates (although the file doesnt seem to represent it that way).
Here's the URLs in Google search results for Term-Life-Insurance:
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance/montreal/quebec (duplicate of previous)
- http://www.healthchoices.ca/video-link/insurance-and-disability-planning/Term-Life-Insurance
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance/laval/quebec (duplicate of previous)
Looking at the first two as an example, when you look at th pages themselves they are currently not exact duplicates. The first one is a video of a guy talking about term life insurance with some other video links, and the second page is a page that has an error "Error: Video Category Page is currently unavailable." where the page content should be. But that page had previously been an exact duplicate of the first URL the last time Google visited the page.
Here is the first page again:
http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance
Here is the cached version of the second (duplicate) page (as I'm currently seeing it, it was last cached on Apr 19, 2011):
To see these pages (or any potential duplicate URL issues), do this search in Google:
- site:www.healthchoices.ca
- To find pages with a specific URL pattern (like the term life insurance pages) try "site:www.healthchoices.ca inurl:Term-Life-Insurance" (without the quotation marks)
- Then at the end of the URL you see in the address bar, add "&filter=0" (without the quoutes).
So what is in your browser address bar would look like this (although it may have some additional thinkgs in your URL like your previous query and your browser and language for example - that's ok):
http://www.google.com/search?q=site:www.healthchoices.ca+inurl:Term-Life-Insurance&filter=0
I'm not sure what the URL issue is that you're referring to exactly based on the info you pasted and where you may have gotten it from - but I hope this is helpful.
-
Hi Erin,
Can I enquire a little more about where you are lifting these URLs from. I'm assuming you are downloading them from a Campaign? Are the URLs in question lifted from the same row in the CSV? What is the header of the columns they are lifted from? Just need a little more specificity about what we're looking at here in order to respond fully.
-
Thanks for your responses. Hmm...I'm not sure how to do a screen shot as the only way I could see the errors was to download the file. I've pasted a few below straight from the doc
<colgroup><col width="775"><col width="968"></colgroup>
| www.healthchoices.ca/video/ice-sports/default | www.healthchoices.ca/video/ice-sports/default |
| www.healthchoices.ca/video/insurance-and-disability-planning/Key-Man-Insurance | www.healthchoices.ca/video/insurance-and-disability-planning/Key-Man-Insurance |
| www.healthchoices.ca/video/insurance-and-disability-planning/Long-Term-Care-Coverage | www.healthchoices.ca/video/insurance-and-disability-planning/Long-Term-Care-Coverage |
| www.healthchoices.ca/video/insurance-and-disability-planning/Term-Life-Insurance | www.healthchoices.ca/video/insurance-and-disability-planning/Term-Life-Insurance |
| www.healthchoices.ca/video/insurance-and-disability-planning/default | www.healthchoices.ca/video/insurance-and-disability-planning/default | -
Erin, what tool are you using to find this? It might be something to do with the language that your CMS is written in - it might also be a matter of a trailing slash or a non www. version.
I'd be happy to help if you could provide a little more info, perhaps a screen shot?
Aaron
-
Duplicate content by definition is having the same content on different URL's. I've never had the tool tell me I have duplicate content on the same URL. You must be missing something. Is it www vs non-www perhaps? I don't know how you can get identical url's showing up in there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Error Pages being picked up as duplicate content
Hi, I recently noticed an increase in duplicate content, but all of the pages are 404 error pages. For instance, Moz site crawl says this page: https://www.allconnect.com/sc-internet/internet.html has 43 duplicates and all the duplicates are also 404 pages (https://www.allconnect.com/Coxstatic.html for instance is a duplicate of this page). Looking for insight on how to fix this issue, do I add an rel=canonical tag to these 60 error pages that points to the original error page? Thanks!
Technical SEO | | kfallconnect0 -
GWT Duplicate Content and Canonical Tag - Annoying
Hello everyone! I run an e-commerce site and I had some problems with duplicate meta descriptions for product pages. I implemented the rel=canonical in order to address this problem, but after more than a week the number of errors showing in google webmaster tools hasn't changed and the site has been crawled already three times since I put the rel canonical. I didn't change any description as each error regards a set of pages that are identical, same products, same descriptions just different length/colour. I am pretty sure the rel=canonical has been implemented correctly so I can't understand why I still have these errors coming up. Any suggestions? Cheers
Technical SEO | | PremioOscar0 -
Is duplicate content ok if its on LinkedIn?
Hey everyone, I am doing a duplicate content check using copyscape, and realized we have used a ton of the same content on LinkedIn as our website. Should we change the LinkedIn company page to be original? Or does it matter? Thank you!
Technical SEO | | jhinchcliffe0 -
Duplicate Content?
My site has been archiving our newsletters since 2001. It's been helpful because our site visitors can search a database for ideas from those newsletters. (There are hundreds of pages with similar titles: archive1-Jan2000, archive2-feb2000, archive3-mar2000, etc.) But, I see they are being marked as "similar content." Even though the actual page content is not the same. Could this adversely affect SEO? And if so, how can I correct it? Would a separate folder of archived pages with a "nofollow robot" solve this issue? And would my site visitors still be able to search within the site with a nofollow robot?
Technical SEO | | sakeith0 -
Is there ever legitimate near duplicate content?
Hey guys, I’ve been reading the blogs and really appreciate all the great feedback. It’s nice to see how supportive this community is to each other. I’ve got a question about near duplicate content. I’ve read a bunch of great post regarding what is duplicate content and how to fix it. However, I’m looking at a scenario that is a little different from what I’ve read about. I’m not sure if we’d get penalized by Google or not. We are working with a group of small insurance agencies that have combined some of their back office work, and work together to sell the same products, but for the most part act as what they are, independent agencies. So we now have 25 different little companies, in 25 different cities spread across the southeast, all selling the same thing. Each agency has their own URL, each has their own Google local places registration, their own backlinks to their local chambers, own contact us and staff pages, etc. However, we have created landing pages for each product line, with the hopes of attracting local searches. While we vary each landing page a little per agency (the auto insurance page in CA talks about driving down the 101, while the auto insurance page in Georgia says welcome to the peach state) probably 75% of the land page content is the same from agency to agency. There is only so much you can say about specific lines of insurance. They have slightly different titles, slightly different headers, but the bulk of the page is the same. So here is the question, will Google hit us with a penalty for having similar content across the 25 sites? If so, how do you handle this? We are trying to write create content, and unique content, but at the end of the day auto insurance in one city is pretty much the same as in another city. Thanks in advance for your help.
Technical SEO | | mavrick0 -
How do I eliminate duplicate url, duplicate title issues using Joomla CMS?
We have a site using Joomla CMS, integrated with Jreviews and Jomsocial. Utilizing ACE SEF to generate Dynamic URL structure. Our issue is that we are recieving multiple instances of duplicate url's and duplicate titles due to the way joomla is working with jreviews for all our 7,000+ business listings. Site is already ranked for many broad/national keywords, concerned that our state and local rankings are limited by these errors. How can we prevent this from happening without re-writing the entire website?
Technical SEO | | mdmcn0 -
Duplicate Content within Website - problem?
Hello everyone, I am currently working on a big site which sells thousands of widgets. However each widget has ten sub widgets (1,2,3... say) My strategy with this site is to target the long tail search so I'm creating static pages for each possibly variation. So I'll have a main product page on widgets in general, and also a page on widget1, page on widget2 etc etc. I'm anticipating that because there's so much competition for searches relating to widgets in general, I'll get most of my traffic from people being more specific and searching for widget1 or widget 7 etc. Now here's the problem - I am getting a lot of content written for this website - a few hundred words for each widget. However I can't go to the extreme of writing unique content for each sub widget - that would mean 10's of 1,000's of articles. So... what do I do with the content. Put it on the main widget page was the plan but what do I do about the sub pages. I could put it there and it would make perfect sense to a reader and be relevant to people specifically looking for widget1, say, but could there be a issue with it being viewed as duplicate content. One idea was to just put a snippet (first 100 words) on each sub page with a link back to the main widget page where the full copy would be. Not sure whether I've made myself clear at all but hopefully I have - or I can clarify. Thanks so much in advance David
Technical SEO | | OzDave0 -
Duplicate canonical URLs in WordPress
Hi everyone, I'm driving myself insane trying to figure this one out and am hoping someone has more technical chops than I do. Here's the situation... I'm getting duplicate canonical tags on my pages and posts, one is inside of the WordPress SEO (plugin) commented section, and the other is elsewhere in the header. I am running the latest version of WordPress 3.1.3 and the Genesis framework. After doing some testing and adding the following filters to my functions.php: <code>remove_action('wp_head', 'genesis_canonical'); remove_action('wp_head', 'rel_canonical');</code> ... what I get is this: With the plugin active + NO "remove action" - duplicate canonical tags
Technical SEO | | robertdempsey
With the plugin disabled + NO "remove action" - a single canonical tag
With the plugin disabled + A "remove action" - no canonical tag I have tried using only one of these remove_actions at a time, and then combining them both. Regardless, as long as I have the plugin active I get duplicate canonical tags. Is this a bug in the plugin, perhaps somehow enabling the canonical functionality of WordPress? Thanks for your help everyone. Robert Dempsey0