How to resolve Duplicate Page Content issue for root domain & index.html?
-
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
-
great code Josh...but , after i saved it on .htaccess , a "?" appeared on the link..
http://www.domain.com/?/example/file.html
Is this ok ? pls advice/
Thank you,
-
You touched on a good point here "We set up our site to utilize a index redirect for all of our sub directories as well, so with this method you simply name your sub directories to match the url path that you desire. Each sub directory has it's own index which you redirect with a variation of the above code. By doing this you can have nice clean url paths like http://www.semclix.com/design/ecommerce/ - and mitigate the duplicate content issue. We hope that this helps."
Too often I see sites where they get the home page right but miss the re-write on the directories.
-
Here's the .htaccess rewrite command that you can use for the index.html redirect -
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.amarasoftware.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.amarasoftware.com/$1/ [R=301,L]
We set up our site to utilize a index redirect for all of our sub directories as well, so with this method you simply name your sub directories to match the url path that you desire. Each sub directory has it's own index which you redirect with a variation of the above code. By doing this you can have nice clean url paths like http://www.semclix.com/design/ecommerce/ - and mitigate the duplicate content issue. We hope that this helps.
-
I'd check it with some other software too... i.e. Raven Tools free trial or something, that will tell you if there's canonicalization problems... of course I'm not advocating Raven Tools over SEOmoz tools (I'm a member here and not there for good reasons), I just think best to try a few different tests before deciding if it's a problem. There might just be an issue with the SEOmoz campaign tool for the moment, which I'm sure they'll fix as soon as they realise.
Hey, aren't you the tutor I had in my SEC usability course?
-
Unfortunately I can't speak for how SEOmoz handles rewrites like this if it's already crawled the page.
The rewrite rule you're using looks like it's only rewriting the www portion of the URL, not index.html. So alone it wouldn't do anything to solve dupe content issues. (someone please correct me if I'm misreading the rewrite rule)
Here's a link to what I used to write a redirect for index.html on another site.
http://www.webmasterworld.com/forum92/6375.htm
I think it is a fairly safe assumption to make that SEOmoz is smart enough to realize if you're got a redirect in there (providing that its working). I'd still recommend taking a look to see if Google has cached or indexed an index.html version, though.
Edit: my personal, highly technical, acid-test for an index.html redirect is just going there and manually entering the url with index.html on the end, rather than waiting for a recrawl to see if you're heading in the right direction.
-
RewriteEngine on RewriteCond %{HTTP_HOST} ^([a-z.]+)?amarasoftware.com$ [NC] RewriteCond %{HTTP_HOST} !^www. [NC] RewriteRule .? http://www.%1amarasoftware.com%{REQUEST_URI} [R=301,L] Is what I use. In Seomoz this leads to www.amarasoftware.com and index.html so 2 different URL's, both with different incoming links, and a different authority, which has an impact on my ranking if correct. in SEomoz this a returns a duplicate title and meta tags errors. If SEOmoz finds 2 pages instead of one I may assume that Google agrees with this.
-
As you did, I'd normally handle this with a 301 from index.html to the root domain. When you say that it's "not had an impact" do you mean that the SEOmoz dashboard continues to show an error after it re-crawls, or that the search engines are not picking up the redirect?
SEOmoz dashboard does a great job, but I'd check to see how the search engines are actually indexing yourdomain.com/index.html vs. yourdomain.com also. If the search engines are indexing it as you want them to, then I'd be inclined to ignore the dashboard error.
I apologize if this is a stupid question, but I assume you manually checked that the redirect worked?
-
You wish to canonicalize the pages. That is the SEO word which describes exactly what you are trying to achieve.
Above are 5 URLs which can possibly lead to the exact same page. If you add the following HTML in the code then the pages will be canonicalized.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing & Caching Some Other Domain In Place of Mine-Lost Ranking -Sucuri.net Found nothing
Again I am facing same Problem with another wordpress blog. Google has suddenly started to Cache a different domain in place of mine & caching my domain in place of that domain. Here is an example page of my site which is wrongly cached on google, same thing happening with many other pages as well - http://goo.gl/57uluq That duplicate site ( protestage.xyz) is showing fully copied from my client's site but showing all pages as 404 now but on google cache its showing my sites. site:protestage.xyz showing all pages of my site only but when we try to open any page its showing 404 error My site has been scanned by sucuri.net Senior Support for any malware & there is none, they scanned all files, database etc & there is no malware found on my site. As per Sucuri.net Senior Support It's a known Google bug. Sometimes they incorrectly identify the original and the duplicate URLs, which results in messed ranking and query results. As you can see, the "protestage.xyz" site was hacked, not yours. And the hackers created "copies" of your pages on that hacked site. And this is why they do it - the "copy" (doorway) redirects websearchers to a third-party site [http://www.unmaskparasites.com/security-report/?page=protestage.xyz](http://www.unmaskparasites.com/security-report/?page=protestage.xyz) It was not the only site they hacked, so they placed many links to that "copy" from other sites. As a result Google desided that that copy might actually be the original, not the duplicate. So they basically hijacked some of your pages in search results for some queries that don't include your site domain. Nonetheless your site still does quite well and outperform the spammers. For example in this query: [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 But overall, I think both the Google bug and the spammy duplicates have the negative effect on your site. We see such hacks every now and then (both sides: the hacked sites and the copied sites) and here's what you can do in this situation: It's not a hack of your site, so you should focus on preventing copying the pages: 1\. Contact the protestage.xyz site and tell them that their site is hacked and that and show the hacked pages. [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 Hopefully they clean their site up and your site will have the unique content again. Here's their email [email protected] 2\. You might want to send one more complain to their hosting provider (OVH.NET) [email protected], and explain that the site they host stole content from your site (show the evidence) and that you suspect the the site is hacked. 3\. Try blocking IPs of the Aruba hosting (real visitors don't use server IPs) on your site. This well prevent that site from copying your site content (if they do it via a script on the same server). I currently see that sites using these two IP address: 149.202.120.102\. I think it would be safe to block anything that begins with 149.202 This .htaccess snippet should help (you might want to test it) #-------------- Order Deny,Allow Deny from 149.202.120.102 #-------------- 4\. Use rel=canonical to tell Google that your pages are the original ones. [https://support.google.com/webmasters/answer/139066?hl=en](https://support.google.com/webmasters/answer/139066?hl=en) It won't help much if the hackers still copy your pages because they usually replace your rel=canonical with their, so Google can' decide which one is real. But without the rel=canonical, hackers have more chances to hijack your search results especially if they use rel=canonical and you don't. I should admit that this process may be quite long. Google will not return your previous ranking overnight even if you manage to shut down the malicious copies of your pages on the hacked site. Their indexes would still have some mixed signals (side effects of the black hat SEO campaign) and it may take weeks before things normalize. The same thing is correct for the opposite situation. The traffic wasn't lost right after hackers created the duplicates on other sites. The effect build up with time as Google collects more and more signals. Plus sometimes they run scheduled spam/duplicate cleanups of their index. It's really hard to tell what was the last drop since we don't have access to Google internals. However, in practice, if you see some significant changes in Google search results, it's not because of something you just did. In most cases, it's because of something that Google observed for some period of time. Kindly help me if we can actually do anything to get the site indexed properly again, PS it happened with this site earlier as well & that time I had to change Domain to get rid of this problem after I could not find any solution after months & now it happened again. Looking forward for possible solution Ankit
Intermediate & Advanced SEO | | killthebillion0 -
Duplicated Pages and Forums
Does duplicate content hurt that particular duplicated content, or the entire site? There are some parts of my site that I don’t care about getting high rankings on search engines. For example, I have a forum and there are certain links that only logged in people can see. If you aren’t logged in, they will take you to a page where it tells u to log in. google, obviously not logged in, interprets this as lots and lots of the same duplicated page. Should I just leave it alone cause I dont care if those pages makes it to search engines. Will it not hurt the entire site? For example, can my homepage search rankings decrase? That leads to my next question. What is the best way to optimize a forum? Whenever someone posts a new post, it seems another url for the same forum thread is created..... which is obviously duplicated….in other words, if like 20 people post on a thread, i believe my site adds 20 urls for that page...anyone know how to fix this?
Intermediate & Advanced SEO | | waltergah0 -
How to remove hundreds of duplicate pages
Hi - while i was checking duplicate links, am finding hundreds of duplicates pages :- having undefined after domain name and before sub page url having /%5C%22/ after domain name and before the sub page url Due to Pagination limits Its a joomla site - http://www.mycarhelpline.com Any suggestions - shall we use:- 301 redirect leave these as standdstill and what to do of pagination pages (shall we create a separate title tag n meta description of every pagination page as unique one) thanks
Intermediate & Advanced SEO | | Modi0 -
Removed Duplicate Domains, What Should I Expect?
Hi All, So I have been at my current company for 5 months now. I quickly realized that they previously bought multiple domains. The domains do make sense (they are mostly our products, etc). However they did not just redirect to our main website, instead, they were a direct copy of our main website. They had it setup so that when we made changes to our main website, www.mainwebsite.com, that the same exact change went to www.productwebsite.com. Basically we had about 7 of the SAME EXACT websites with a different root domain. So I explained to them the problem with having duplicate content on the web and how we are basically just self cannibalizing our online efforts. This problem is fixed now and I am just wondering if anybody has seen the results before? To tell you the truth we already do pretty well SEO-wise. Just wondering if this will make it even better? I am assuming that this will also take a little while to take effect? Thanks! Pat
Intermediate & Advanced SEO | | PatBausemer0 -
Duplicate Content / 301 redirect Ariticle issue
Hello, We've got some articles floating around on our site nlpca(dot)com like this article: http://www.nlpca.com/what-is-dynamic-spin-release.html that's is not linked to from anywhere else. The article exists how it's supposed to be here: http://www.dynamicspinrelease.com/what-is-dsr/ (our other website) Would it be safe in eyes of both google's algorithm (as much as you know) and with Panda to just 301 redirect from http://www.nlpca.com/what-is-dynamic-spin-release.html to http://www.dynamicspinrelease.com/what-is-dsr/ or would no-indexing be better? Thank you!
Intermediate & Advanced SEO | | BobGW0 -
Duplicate WordPress Home Page Issue
I have an issue where I've created a site, (www.tntperformance805.com), using WordPress as a CMS. I enabled the option to use a static page as the home page, and created that page as /home. Well, now the issue that exists is that Google is indexing both www.tntperformance805.com, and www.tntperformance805.com/home/. I've already setup a 301 redirect, pointing /home/ to the main domain, and even have rel=canonical set up automatically, pointing every page to the www version of that particular page. However, Google Webmaster Tools is still reporting the pages as having duplicate page titles and descriptions. I've even had the page removed from Google's cache and index. I'm assuming Google is not considering the 301 redirect, even though it's setup properly. Should I add rel="canonical" href="http://www.tntperformance805.com" /> to the body of the /home/ post, to ensure that it is giving credit to the main domain? I am assuming the page is only redirecting to , as that's the www version, but I thought the 301 redirect would enforce that the search engines should give all credit to the main domain. Thanks in advance for the help everyone. I look forward to some insightful feedback. Best Regards, Matt Dimock
Intermediate & Advanced SEO | | National-Positions0 -
What constitutes duplicate content?
I have a website that lists various events. There is one particular event at a local swimming pool that occurs every few months -- for example, once in December 2011 and again in March 2012. It will probably happen again sometime in the future too. Each event has its own 'event' page, which includes a description of the event and other details. In the example above the only thing that changes is the date of the event, which is in an H2 tag. I'm getting this as an error in SEO Moz Pro as duplicate content. I could combine these pages, since the vast majority of the content is duplicate, but this will be a lot of work. Any suggestions on a strategy for handling this problem?
Intermediate & Advanced SEO | | ChatterBlock0 -
Duplicate Content Through Sorting
I have a website that sells images. When you search you're given a page like this: http://www.andertoons.com/search-cartoons/santa/ I also give users the option to resort results by date, views and rating like this: http://www.andertoons.com/search-cartoons/santa/byrating/ I've seen in SEOmoz that Google might see these as duplicate content, but it's a feature I think is useful. How should I address this?
Intermediate & Advanced SEO | | andertoons0