Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google only indexed 19/94 images
I'm using Yoast SEO and have images (attachments) excluded from sitemaps, which is the recommended method (but could this be wrong?). Most of my images are in my posts; here's the sitemap for posts: https://edwardsturm.com/post-sitemap.xml I also appear on p1 for some good keywords, and my site is getting organic traffic, so I'm not sure why the images aren't being indexed. Here's an example of a well performing article: https://edwardsturm.com/best-games-youtube-2016/ Thanks!
Technical SEO | | Edward_Sturm0 -
Ranking and Indexing Issue
We have an established site www.getinspired365.com that previously wasn't SEO optimised. We are currently in the process of testing out some new pages to see if we can get them to rank in Google, however we are seeing huge fluctuations in where they rank. Within the first few days we saw our page rank on the first or second page, however it has now dropped out of the top 250 search results. We are wondering if we have made any mistakes with our optimisation ? Example Page : Keyword to target - "If you laugh, you think, and you cry, that's a full day. That's a heck of a day. You do that seven days a week, you're going to have something special." URL : http://www.getinspired365.com/if-you-laugh-you-think-and-you-cry-thats-a-full-day-thats-a-heck-of-a-day-you-do-that-seven-days-a-week-youre-going-to-have-something-special We can see it has been indexed by Google but is now not ranking in the top 250 search engine results. We have run the On Page Grader from SEOMoz and it ranks the page as an "A" so we suspect that we are doing the SEO ok on the page, but can't work out why it isn't ranking, despite ranking on the first or second page after a few days ? We have other pages that aren't SEO optimised that rank better than our newly SEO optimised pages e.g. Keyword - "THE BEST LOVE IS THE KIND THAT AWAKENS THE SOUL AND MAKES US REACH FOR MORE, THAT PLANTS A FIRE IN OUR HEARTS AND BRINGS PEACE TO OUR MINDS. AND THAT'S WHAT YOU'VE GIVEN ME. THAT'S WHAT I'D HOPED TO GIVE YOU FOREVER" URL: http://www.getinspired365.com/20130528 Any advice you could offer would be great. Thanks ! Mike
Technical SEO | | MichaelWhyley0 -
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
Robots.txt
www.mywebsite.com**/details/**home-to-mome-4596 www.mywebsite.com**/details/**home-moving-4599 www.mywebsite.com**/details/**1-bedroom-apartment-4601 www.mywebsite.com**/details/**4-bedroom-apartment-4612 We have so many pages like this, we do not want to Google crawl this pages So we added the following code to Robots.txt User-agent: Googlebot Disallow: /details/ This code is correct?
Technical SEO | | iskq0 -
Bing and Yahoo Indexing
I have a young site (6 most) that is almost completely indexed by Google but Bing and Yahoo will only index a few pages. Does anyone have any tips for getting more pages indexed in Bing and Yahoo. The site is registered with Bing Webmaster tools and has a valid XML sitemmap.
Technical SEO | | waynekolenchuk0 -
Should I allow index of category / tag pages on Wordpress?
Quite simply, is it best to allow index of category / tag pages on a Wordpress blog or no index them? My thought is Google will / might see it as duplicate content? Thanks, K
Technical SEO | | SEOKeith0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10 -
De-indexing thin content & Panda--any advantage to immediate de-indexing?
We added the nonidex, follow tag to our site about a week ago on several hundred URLs, and they are still in Google's index. I know de-indexing takes time, but I am wondering if having those URLs in the index will continue to "pandalize" the site. Would it be better to use the URL removal request? Or, should we just wait for the noindex tags to remove the URLs from the index?
Technical SEO | | nicole.healthline0