Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to no index / no follow CAD files .dxf .dwg
Hi, I have a new Wordpress site with a number of CAD files (.dxf& .dwg) downloadable straight from the site. These have been flagged in MOZ as warnings with everying from No Title/Description to duplicate content. Does anybody now how I would no index these type of files? Many thanks.
Technical SEO | | Jon_Pearce0 -
Little confused regarding robots.txt
Hi there Mozzers! As a newbie, I have a question that what could happen if I write my robots.txt file like this... User-agent: * Allow: / Disallow: /abc-1/ Disallow: /bcd/ Disallow: /agd1/ User-agent: * Disallow: / Hope to hear from you...
Technical SEO | | DenorL0 -
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
Modx revolution- getting around index.php vs. root duplicate content issue?
Basically, SEOMoz bots are flagging our index.php and root files as duplicate content of one another, therefore cutting the page authority of each. What we want to do is make the root the canonical preference over index.php. Ordinarily, we should be able to do this in the htaccess file. For some reason, as the site has been built into a cms using ModX Revolution, this does not seem to work. We've tried A TON of htaccess rewrite mods to resolve this issue to no avail. We have also tried revising our sitemap to include only the root address. Any ideas? We'll try most anything at this point. Thanks in advance.
Technical SEO | | G2W0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0