How to fix duplicate content for homepage and index.html
-
Hello,
I know this probably gets asked quite a lot but I haven't found a recent post about this in 2018 on Moz Q&A, so I thought I would check in and see what the best route/solution for this issue might be. I'm always really worried about making any (potentially bad/wrong) changes to the site, as it's my livelihood, so I'm hoping someone can point me in the right direction.
Moz, SEMRush and several other SEO tools are all reporting that I have duplicate content for my homepage and index.html (same identical page).
According to Moz, my homepage (without index.html) has PA 29 and index.html has PA 15. They are both showing Status 200. I read that you can either do a 301 redirect or add rel=canonical
I currently have a 301 setup for my http to https page and don't have any rel=canonical added to the site/page. What is the best and safest way to get rid of duplicate content and merge the my non index and index.html homepages together these days? I read that both 301 and canonical pass on link juice but I don't know what the best route for me is given what I said above.
Thank you for reading, any input is greatly appreciated!
-
OK, Paul, I hear what you are saying. It's a very open and obvious diss.
I'm not sure what you are saying makes any difference to the argument that the canonical way here is not the way to go. I was explaining in the simplest way, I would not want, and I'm sure you would not want either, a live page like this - the home page, live and canonicalised.
(It's a given that the canonical works like a 301, passing link juice to the preferred version.)
So thanks but it makes no difference - delete & 301 every time.
Google is heightening its distrust of canonicals - the new Seach Console tool reveals which pages are the preferred canonical and it's something of a surprise to SEOs!
If you feel like playing top trumps again then why not PM me? - it's so much better and the uninitiated do not need to see it!
Cheers Nigel
-
A proper canonical tag does a lot more than "just be telling Google not to rank it" When used properly (i.e. pages that truly do contain the same content), the canonicalised page passes its ranking signals back to the canonical source.
I agree with Kristina - while a 301 would be preferable (it's a directive, while canonical tags are taken as suggestions), a canonical tag would be vastly better than not doing anything about the issue. At least until the dev can get the problem with the 301-redirect properly resolved.
Paul
-
It's best practice to redirect, but if that's not an option, the canonical route should help the problem a lot! You'll probably lose some link equity with this route, but it should clear up duplicate content issues from Google's side.
-
Hi Dre
If you just do a canonical then the page will still be live, you will just be telling Google not to rank it. Best practice is to remove it all together and 301. It is bad practice having more than one version of your home page, (any page) live!
Regards Nigel
-
Thank you so much for all the responses. So it sounds like 301 redirect through htaccess is the way to go. What is the difference between using the 301 through htaccess vs using rel=canonical in my case? Does the 301 provide better link juice vs rel=canonical or is canonical just not applicable in this case? Thanks for all the replies and helpful suggestions again!
EDIT: I spoke to my developer (who is hosting and maintaining my site now).. he said he tried to do 301 through htaccess but it seems to be crashing the site (and trust me he is very good at what he does). Part of the problem is that my site is VERY old (originally build about 10 years ago and NOT updated once since).. he has been slowly updating and cleaning up the site slowly and he will try to figure out why the 301 is crashing the site and not working but in the mean time how safe is it to use rel=canonical instead of a 301?
Thanks again!
-
Hi dre
Your site really shouldn't be generating an index.html in the first place but if it is you must make sure that there is a 301 in the htaccess file sending all traffic to the single homepage URL as Lynn correctly points out this will be a permanent redirect.
It is very simple to do. Both versions are treated as separate pages (as http and https) so you are essentially showing a duplicate site to Google so your rankings will be terrible until you change.
Regards Nigel
-
Hello there,
You can use .htaccess URL rewrite to remove all the .html from your URL, here's the rewrite rules.
RewriteEngine On
RewriteRule ^index.html$ / [R=301,L]
RewriteRule ^(.*)/index.html$ /$1/ [R=301,L]Once you added this rules you should also fix all your internal links make sure they link to the URL without .html
Hope this helps,
Joseph Yap
-
"I currently have a 301 setup for my http to https page" - great! Also, you should check if your inner pages redirecting from HTTP-versions to HTTPS too.
index.html should redirect to the homepage main version with 301 Permanent Redirect.
-
Google consider HTTP and HTTPS as two separate protocols. Since the contents are same on both versions, google bots consider it as duplicate content. Adding a canonical URL will solve this problem. If you have any doubts, feel free to ask.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Consolidating a Large Site with Duplicate Content
I will be restructuring a large website for an OEM. They provide products & services for multiple industries, and the product/service offering is identical across all industries. I was looking at the site structure and ran a crawl test, and learned they have a LOT of duplicate content out there because of the way they set up their website. They have a page in the navigation for “solution”, aka what industry you are in. Once that is selected, you are taken to a landing page, and from there, given many options to explore products, read blogs, learn about the business, and contact them. The main navigation is removed. The URL structure is set up with folders, so no matter what you select after you go to your industry, the URL will be “domain.com/industry/next-page”. The product offerings, blogs available, and contact us pages do not vary by industry, so the content that can be found on “domain.com/industry-1/product-1” is identical to the content found on “domain.com/industry-2/product-1” and so-on and so-forth. This is a large site with a fair amount of traffic because it’s a pretty substantial OEM. Most of their content, however, is competing with itself because most of the pages on their website have duplicate content. I won’t begin my work until I can dive in to their GA and have more in-depth conversations with them about what kind of activity they’re tracking and why they set up the website this way. However, I don’t know how strategic they were in this set up and I don’t think they were aware that they had duplicate content. My first thought would be to work towards consolidating the way their site is set up, so we don’t spread the link-equity of “product-1” content, and direct all industries to one page, and track conversion paths a different way. However, I’ve never dealt with a site structure of this magnitude and don’t want to risk messing up their domain authority, missing redirect or URL mapping opportunities, or ruin the fact that their site is still performing well, even though multiple pages have the same content (most of which have high page authority and search visibility). I was curious if anyone has dealt with this before and if they have any recommendations for tackling something like this?
On-Page Optimization | | cassy_rich0 -
Homepage and keywords
hello, another problem i am facing is that if i see in my rankings over 90% of keywords are connected with my home page. When i go to moz pro in Page Optimization Score wanting to optimize the page to rank better there are some propositions the issue is that it is impossible to have over 100 keywords in home page title to optimize it better for each one of these. I have more specific build more specific sites for many of these keywords in the site but google continues to rank all those keywords for the home page and not for the more specific page that could also be optimized for every keyword it deals with. In adition the question i posted in moz with url: https://mza.seotoolninja.com/community/q/greek-language-distinctiveness is also mainly connected with above issue. Please help thanks
On-Page Optimization | | anavasis0 -
Duplicate Page content | What to do?
Hello Guys, I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this: www.exemple.com/user/login?destination=node/125%23comment-form What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster? Thanks in advance! Pedro Pereira
On-Page Optimization | | Kalitenko20140 -
Should homepage contain microdata?
If a homepage lists several items, changing frequently, is it advisable to add microdata and therefore tell Google as much as possible about the items there, or should microdata be added only to an item/news/video/post detail page?
On-Page Optimization | | RichardKay0 -
Keyword at homepage
Hi there, this is the url of my homepage: http://www.sehaidoyamama.com/piensapiensa/ PiensaPiensa is not my keyword. I have selected a keyword to optimize the homepage. Does anyone know how to include the keyword in the url of the homepage? I tried to do it in the "pages" section of wordpress but I wasn't able to do it. The home page doesn't allow to change the url. Is it important to set up a keyowrd in the url of the homepage? Is it appropiate from the usability perspective? Thanks
On-Page Optimization | | juanmiguelcr0 -
Events in Wordpress Creating Duplicate Content Canonical Issues
Hi, I have a site which uses Event Manager Pro within Wordpress to create Events (as custom post types on my blog. I use it to advertise cookery classes. In a given month I might run one type of class 4 times. The event page I have made for each class is the same and I duplicate it 4 times and just change the dates to promote it. The problem is with over 10 different classes, which are then duplicated up to 4 times each per month. I get loads of duplicate content errors. How can I fix this without redirecting people away from the correct page for the date they are interested in? Is it best just to use a no follow for ALL events and rely on the other parts of my site for SEO? Thanks, T23
On-Page Optimization | | tekton230 -
Best practice to solve this Unique duplicate page content issue?
I just got Seomoz Pro (it's awesome!), and when I did a campaign for my website I discovered that I have a big issue with duplicate page content (as well as titles). The Crawl Diagnostics Summary told me I have 196 Crawl Errors Found (I had a total of 362 pages crawled on my site), and as much as 160 of these was duplicate page content. Which to me sounds like a big problem, correct me if I'm wrong (I'm very new to SEO). So our website is an ecommerce that sells greeting cards. The unique part about our platform is that we offer the customer to make a customization of the cards.
On-Page Optimization | | danielpett
Let me walk you through each step a customer takes so you fully understand: They find a card they like and visit the product page of that card (just like on any ecommerce store.) They then decide they want to buy it. There is no "Add to cart" button, they will instead click on a "customize the card" button. 3) This takes them to a step by step process of customizing the card. They change the name on the front of the greeting card so it says for example: "Happy Birthday Katy!". And then adds a personal text on the inside of the card. They then add an delivery address and when it should be delivered. After that they proceed to checkout and it's all done. This is my website (it's in Swedish): loveday.se - it will take you to a product page so that you can click the green button and see what I mean with the customization pages. Hopefully it helps even though it's in Swedish. My issue starts at the customization part of the site (the bolded step above), as I can see the permalinks in the diagnostics I got.
This step-by-step process looks exactly the same with every card in the store. Same call-to-action headline, same descriptive text etc. The only difference is a JPEG-file with the unique greeting card design. So, what is your take on this? Let me know if I was unclear about something. Any help or advice is greatly appreciated.0 -
Do videos count as duplicate content?
If we allow users to embed our videos on their site, would that count as duplicate content? I imagine note, given that Google can't usually 'see' the content of videos, but just want to double check.
On-Page Optimization | | nicole.healthline0