Duplicate content /index.php/ issues
-
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.:
www.mysite.com/category/article
and
www.mysite.com/index.php/category/article
Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out.
Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
-
Hey Emory - if that's the default .htaccess file your software created (assume this is a Joomla-based site?), it looks like the redirect code you need is already there, but it is disabled by default.
The following code
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]should do what you want, The reason its not currently doing anything is because it has been commented out. The "#" symbol at the beginning of each line tells the server NOT to run the code in that line.
Try removing the "#" symbol in front of the last three lines of that code, save the file & then thoroughly test your site. (It's not the way I would write it, but there may be specific requirements for your site/system) The first line is just a descriptive header, so the "#" symbol needs to be left on it.
If for any reason it causes problems, you can simply re-add the "#" symbols and re-save to return the site to its original state.
Give that a shot and let us know if it accomplishes what you want to do.
Paul
P.S. In particular when testing - ensure that client logins work correctly, and that the search function and all plugins also still work.
-
Any ideas/input?
-
Tried that in many ways, but can't get it working. Here is a copy of the .htaccess file, what changes would need to be made (clearly input that code):
Options +FollowSymLinks
RewriteEngine On
prevents people from accessing anything with phpMyAdmin
RewriteRule phpMyAdmin - [F]
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www.mysite.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]redirect 301 /home.html http://www.mysite.com/
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]s)+cript.(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]#RewriteBase /
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/[^.]|.(php|html?|feed|pdf|raw))$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L] -
Hi Emory,
Simple solution would be to redirect to root from the index.php using htaccess using the rule below. Lets us know how this works for you
RewriteRule ^(.*)index.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have duplicate content but // are causing them
I have 3 pages duplicated just by a / Example: https://intercallsystems.com/intercall-nurse-call-systems**//**
Technical SEO | | Renalynd
https://intercallsystems.com/intercall-nurse-call-systems**/** What would cause this?? And how would I fix it? Thanks! Rena0 -
Duplicate Content
Hi, I'm working on a site and I'm having some issues with its structure causing duplicate content. The first issue is that the search pages will show up as duplicates.
Technical SEO | | OOMDODigital
A search for new inventory may be new.aspx
The duplicate may be something like new.aspx=page1, or something like that and so on. The second issue is with inventory. When new inventory gets put into the stock of the store, a new page for that item will be populated with duplicate content. There appears to be no canonical source for that page. How can I fix both of these? Thanks!0 -
Duplicate Content in Wordpress.com
Hi Mozers! I have a client with a blog on wordpress.com. http://newsfromtshirts.wordpress.com/ It just had a ranking drop because of a new Panda Update, and I know it's a Dupe Content problem. There are 3900 duplicate pages, basically because there is no use of noindex or canonical tag, so archives, categories pages are totally indexed by Google. If I could install my usual SEO plugin, that would be a piece of cake, but since Wordpress.com is a closed environment I can't. How can I put a noindex into all category, archive and author peges in wordpress.com? I think this could be done by writing a nice robot.txt, but I am not sure about the syntax I shoud use to achieve that. Thank you very much, DoMiSol Rossini
Technical SEO | | DoMiSoL0 -
Duplicate content
I'm getting an error showing that two separate pages have duplicate content. The pages are: | Help System: Domain Registration Agreement - Registrar Register4Less, Inc. http://register4less.com/faq/cache/11.html 1 27 1 Help System: Domain Registration Agreement - Register4Less Reseller (Tucows) http://register4less.com/faq/cache/7.html | These are both registration agreements, one for us (Register4Less, Inc.) as the registrar, and one for Tucows as the registrar. The pages are largely the same, but are in fact different. Is there a way to flag these pages as not being duplicate content? Thanks, Doug.
Technical SEO | | R4L0 -
How to resolve this Duplicate content?
Hi , There is page i get when i do proper menu navigation Caratlane.com>jewellery>rings>casualsrings> http://www.caratlane.com/jewellery/rings/casual-rings/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html When i do a site search in my search box by my product code number "JR00219" The same page is appears with different url http://www.caratlane.com/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html So there is a duplicate content. How can we resolve it. Regards, kathir caratlane.com
Technical SEO | | kathiravan0 -
How can i resolve Duplicate Page Content?
Hello, I have created one campaign over SEOmoz tools for my website AutoDreams.it i have found 159 duplicate page content. My problem is that this web site is about car adsso it is easy to create pages with duplicate content and also Car ads are placed byregistered users. How can i resolve this problem? Regards Francesco
Technical SEO | | francesco870 -
Different TLD's same content - duplicate content? - And a problem in foreign googles?
Hi, Operating from the Netherlands with customers troughout Europe we have for some countries the same content. In the netherlands and Belgium Dutch is spoken and in Germany and Switserland German is spoken. For these countries the same content is provided. Does Google see this as duplicate content? Could it be possible that a german customer gets the Swiss website as a search result when googling in the German Google? Thank you for your assistance! kind regards, Dennis Overbeek [email protected]
Technical SEO | | SEO_ACSI0 -
The Bible and Duplicate Content
We have our complete set of scriptures online, including the Bible at http://lds.org/scriptures. Users can browse to any of the volumes of scriptures. We've improved the user experience by allowing users to link to specific verses in context which will scroll to and highlight the linked verse. However, this creates a significant amount of duplicate content. For example, these links: http://lds.org/scriptures/nt/james/1.5 http://lds.org/scriptures/nt/james/1.5-10 http://lds.org/scriptures/nt/james/1 All of those will link to the same chapter in the book of James, yet the first two will highlight the verse 5 and verses 5-10 respectively. This is a good user experience because in other sections of our site and on blogs throughout the world webmasters link to specific verses so the reader can see the verse in context of the rest of the chapter. Another bible site has separate html pages for each verse individually and tends to outrank us because of this (and possibly some other reasons) for long tail chapter/verse queries. However, our tests indicated that the current version is preferred by users. We have a sitemap ready to publish which includes a URL for every chapter/verse. We hope this will improve indexing of some of the more popular verses. However, Googlebot is going to see some duplicate content as it crawls that sitemap! So the question is: is the sitemap a good idea realizing that we can't revert back to including each chapter/verse on its own unique page? We are also going to recommend that we create unique titles for each of the verses and pass a portion of the text from the verse into the meta description. Will this perhaps be enough to satisfy Googlebot that the pages are in fact unique? They certainly are from a user perspective. Thanks all for taking the time!
Technical SEO | | LDS-SEO0