Duplicate content issue
-
Hi everyone,
I have an issue determining what type of duplicate content I have.
www.example.com/index.php?mact=Calendar,m57663,default,1&m57663return_id=116&m57663detailpage=&m57663year=2011&m57663month=6&m57663day=19&m57663display=list&m57663return_link=1&m57663detail=1&m57663lang=en_GB&m57663returnid=116&page=116
Since I am not an coding expert, to me it looks like it is a URL parameter duplicate content. Is it?
At the same time "return_id" would makes me think it is a session id duplicate content. I am confused about how to determine different types of duplicate content, even by reading articles on Seomoz about it: http://www.seomoz.org/learn-seo/duplicate-content.
Could someone help me on how to recognize different types of duplicate content?
Thank you!
-
Thank you guys for being so helpful!!:)
-
Hello Jeff, I would like to say first that lots of sites have duplicate content problems. For the most part, this is not a huge issue. When search engines find duplicate content they choose one of the pages to list in the index, and then will ignore the other. This assumes, of course, that the nature of the duplicate content is not so bad that it would lead to the search engine wanting to ban you. This can happen if a review of your situation causes them to believe that you are deliberately trying to rank multiple times for the same search terms.
Here is a link that fixes the problem of duplicate content :
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
-
Let me try.
1. The answer to your first question is that it only matters if you're trying to figure out how to handle it programmaticaly. In this case you might have to ask the developer if this is being done by a session id. To me it looks more like a URL parameter, but without a live example I wouldnt know, could you provide the website in question? If not try visiting the website once, clear your cache and then visit again and see if the number after "return_id" changes. if it changes that is a session id. If it stays the same have a friend visit the website in the same manor and see if the number stays the same, if it changes then there's a good chance that this is a session id.
No matter if it's a session id adding it or not "return_id" is technically a URL parameter that is triggered by a session id.
2. The second question is still a bit vague, so let me see if this is correct. are you asking how to treat the duplicate content once you know what is causing it? If so, then follow these rules.
If the content changes significantly in the presence of the session id or parameter then this is not duplicate content. If the content does change do the following:
- make sure to use rel canonical for the root URL. In your example that would be: www.example.com/index.php?mact=Calendar
- set the URL parameters in Google and Bings webmaster tools to treat the parameter correctly.
- When the parameter or session id is present add the noindex, follow robots tag. this will allow the bots to spider through and pass on link juice in the event that someone links to your parameter versions
I think you have a larger issue, which is that your website's code is using the index.php to generate all of the pages, in the example that is calendar. This is a common mistake that programmers make since they work to do things as quickly and efficiently as possible. Its far easier to keep all of the code in the one file than to create several different dynamic files that work with each other.
If you dont have the ability to break this down and generate out different pages you might be able to use URL Rewrites to make browsers and bots think the URLs are actually different.
-
Thank you for your answers but I guess I didn't formulate properly my question.
My 1st question was: What kind of duplicate content is it?
- session id
- or url parameter
My second question is: How do you differentiate them? What do you look at when a duplicate content is a session id one or a url parameter issue?
-
You can determine if you have duplicate content several ways. search in google site:example.com and see how many pages google knows at your website. Also, when you are on page with this crazy url, open source code and see if a page has rel="canonical" tag. In your page that would be the best solution to signal robot that this is the same page as your index.php page.
Also, you can try Xenu. good and fast program to run your site on duplicates.
Hope it helps, you can show your website so we can take a look.
-
Hi Jeff,
index.php is the same as index.php?something=something&anotherthing=somethinglese
Each page should have a different url like index.php and page.php instead of always using index.php
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content and 404 errors
I apologize in advance, but I am an SEO novice and my understanding of code is very limited. Moz has issued a lot (several hundred) of duplicate content and 404 error flags on the ecommerce site my company takes care of. For the duplicate content, some of the pages it says are duplicates don't even seem similar to me. additionally, a lot of them are static pages we embed images of size charts that we use as popups on item pages. it says these issues are high priority but how bad is this? Is this just an issue because if a page has similar content the engine spider won't know which one to index? also, what is the best way to handle these urls bringing back 404 errors? I should probably have a developer look at these issues but I wanted to ask the extremely knowledgeable Moz community before I do 🙂
Technical SEO | | AliMac260 -
How different should content be so that it is not considered duplicate?
I am making a 2nd website for the same company. The name of the company, our services, keywords and contact info will show up several times within the text of both websites. The overall text and paragraphs will be different but some info may be repeated on both sites. Should I continue this? What precautions should I take?
Technical SEO | | savva0 -
Duplicate page content - index.html
Roger is reporting duplicate page content for my domain name and www.mydomain name/index.html. Example: www.just-insulation.com
Technical SEO | | Collie
www.just-insulation.com/index.html What am I doing wrongly, please?0 -
Multiple Sites Duplicate Content Best Practice
Hi there, I have one client (atlantawidgets.com) who has a main site. But also has duplicate sites with different urls targeting specific geo areas. I.e. (widgetmakersinmarietta.com) Would it be best to go ahead and create a static home page at these add'l sites and make the rest of the site be nonindexed? Or should I go in and allow more pages to be indexed and change the content? If so how many, 3, 5, 8? I don't have tons of time at this point. 3)If I change content within the duplicate sites, what % do I need to change. Does switching the order of the sentences of the content count? Or does it need to be 100%fresh? Thanks everyone.
Technical SEO | | greenhornet770 -
Duplicate content - font size and themes
Hi, How do we sort duplicate content issues with: http://www.ourwebsite.co.uk/ being the same as http://www.ourwebsite.co.uk/StyleType=SmallFont&StyleClass=FontSize or http://www.ourwebsite.co.uk/?StyleType=LargeFont&StyleClass=FontSize and http://www.ourwebsite.co.uk/legal_notices.aspx being the same as http://www.ourwebsite.co.uk/legal_notices.aspx?theme=default
Technical SEO | | Houses0 -
Techniques for diagnosing duplicate content
Buonjourno from Wetherby UK 🙂 Diagnosing duplicate content is a classic SEO skill but I'm curious to know what techniques other people use. Personally i use webmaster tools as illustrated here: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/webmaster-tools-duplicate.jpg but what other techniques are effective? Thanks,
Technical SEO | | Nightwing
David0 -
Duplicate content
I'm getting an error showing that two separate pages have duplicate content. The pages are: | Help System: Domain Registration Agreement - Registrar Register4Less, Inc. http://register4less.com/faq/cache/11.html 1 27 1 Help System: Domain Registration Agreement - Register4Less Reseller (Tucows) http://register4less.com/faq/cache/7.html | These are both registration agreements, one for us (Register4Less, Inc.) as the registrar, and one for Tucows as the registrar. The pages are largely the same, but are in fact different. Is there a way to flag these pages as not being duplicate content? Thanks, Doug.
Technical SEO | | R4L0 -
Is 100% duplicate content always duplicate?
Bit of a strange question here that would be keen on getting the opinions of others on. Let's say we have a web page which is 1000 lines line, pulling content from 5 websites (the content itself is duplicate, say rss headlines, for example). Obviously any content on it's own will be viewed by Google as being duplicate and so will suffer for it. However, given one of the ways duplicate content is considered is a page being x% the same as another page, be it your own site or someone elses. In the case of our duplicate page, while 100% of the content is duplicate, the page is no more than 20% identical to another page so would it technically be picked up as duplicate. Hope that makes sense? My reason for asking is I want to pull latest tweets, news and rss from leading sites onto a site I am developing. Obviously the site will have it's own content too but also want to pull in external.
Technical SEO | | Grumpy_Carl0