Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to resolve warning of pages with redirect chain when its your http:// to https://www.
how do I write a 301 redirect in the htaccess file so that http:// goes straight to https://www. Moz replyEli profileHey there!Thanks for reaching out to us!
Technical SEO | | VelocityWebsites0 -
No index
Screaming frog spider does index pages on our website like: wp-content/plugins/woocommerce/assets/js/frontend/jquery-ui-touch-punch.min.js?ver=2.3.9 wp-content/plugins/mailchimp-for-wp/assets/css/checkbox.min.css?ver=2.3.2 Is it a bad/good idea to set my parameters in Webmastertools and tell Google not to crawl pages that begin with wp/content? Thanks!
Technical SEO | | Happy-SEO1 -
Sitemap indexation
3 days ago I sent in a new sitemap for a new platform. Its 23.412 pages but until now its only 4 pages (!!) that are indexed according to the Webmaster Tools. Why so few? Our stage-enviroment got indexed (more than 50K pages) in a few days by a mistake.
Technical SEO | | Morten_Hjort0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Noindex Pages indexed
I'm having problem that gogole is index my search results pages even though i have added the "noindex" metatag. Is the best thing to block the robot from crawling that file using robots.txt?
Technical SEO | | Tedred0 -
Duplicate content issue index.html vs non index.html
Hi I have an issue. In my client's profile, I found that the "index.html" are mostly authoritative than non "index.html", and I found that www. version is more authoritative than non www. The problem is that I find the opposite situation where non "index.html" are more authoritative than "index.html" or non www more authoritative than www. My logic would tell me to still redirect the non"index.html" to "index.html". Am I right? and in the case I find the opposite happening, does it matter if I still redirect the non"index.html" to "index.html"? The same question for www vs non www versions? Thank you
Technical SEO | | Ideas-Money-Art0 -
Bing indexing
Hello, people~ I want to discuss about Bing indexation. I have a new web site which opened about 3 months ago. Google has no problem to index my site and all pages within the site indexed by Google. However, Bing and Yahoo is different story. I used manual submission, Bing webmaster tool to let Bing know about the site. However, Bing is not indexing my site yet. I researched about it and found that my site should have some external links before I get index by Bing. I check external links of my site with Google webmaster tool, SEOmoz tool and "link:" on Google. All tools show different number as below. Google webmaster Tool : more than 50 SEMoz site explorer : 5 link: on Google: none Why all method of checking links are different and which on should most depend on? Also how many links should I have in order to get index by Bing? Could you people please share your opinion?
Technical SEO | | Artience0 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0