Robots.txt file getting a 500 error - is this a problem?
-
Hello all!
While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file.
As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed?
Thanks in advance!
-
Hi Barry,
Thanks for your swift response on this. The pages certainly seem to be getting cached correctly, and when we initially took over the SEO and made wholesale changes to the site, there were huge improvements, so it looks for all the world like the main pages at least are being looked at.
But I think you make a good point about getting it solved anyway so we can identify any problems that may be occurring / will occur later.
-
robots.txt isn't a requirement, indeed it's only voluntarily followed by spiders (as in they can choose to ignore it), so I think you'll be fine without it. The default is to 'allow all' and 'follow, index', so they should still be crawling the site correctly.
Check in Webmaster tools by fetching as Googlebot or alternative find a page and put cache:pageurl.html into google and see if it's cached it correctly.
That said returning a 500 instead of a 404 may be causing an issue that isn't obviously apparent and 500 is a bit too generic a message to say specifically what, but I would try and solve it as quick as possible. The benefits will depends on what you put in your robots.txt file
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index tag robots.txt
Hi Mozzers, A client's website has a lot of internal directories defined as /node/*. I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages. However, the pages are already indexed and appear in the search results. In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page. Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages. Thanks!
Technical SEO | | WeAreDigital_BE
Jens0 -
Is dash problem for seo?
My web site http://www.green-lotus-trekking.com is this problem for google search engine optimization? Some little percentage problem or totally I am in Confusion?
Technical SEO | | agsln0 -
Clone TLD Problems
I have an online services website www.geekwik.com which I started 3 months back. I also recently made a clone in TLD geekwik.in which has the same content, only pricing is in INR and is targeted at India users, while geekwik.com is targetted at global users with pricing in USD. How do I manage these 2 sites so that I do not face duplicate content penalty from google and the sites do not cannibalize on each other. Is there anything specific I need to do in robots.txt or .htaccess or sitemaps or hrelang etc? I personally feel that after putting up geekwik.in couple of weeks ago, the ranking of geekwik.com went down and I started getting lesser search queries. I would be putting up an IP based switch on both sites shortly so that Indian users are redirected to .in TLD and non-Indians are redirected to .com TLD. From SEO standpoint what are the things I need to do to counter these problems mentioned above. Putting India version in a subdirectory is also an option.
Technical SEO | | geekwik0 -
Robots.txt best practices & tips
Hey, I was wondering if someone could give me some advice on whether I should block the robots.txt file from the average user (not from googlebot, yandex, etc)? If so, how would I go about doing this? With .htaccess I'm guessing - but not an expert. What can people do with the information in the file? Maybe someone can give me some "best practices"? (I have a wordpress based website) Thanks in advance!
Technical SEO | | JonathanRolande0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
is pointing to the same page that it is already on, is this a problem?
So we have a wordpress site with the all-in-one-seo-pack installed. I have just noticed in our crawl diagnostics that a canonical tag has been put in place on every single one of our pages, but they are all pointing to the pages that they are already on. Is this a problem? Should I be worried about this and delve more deeply to figure out as to why this has happened and get it removed? Thanks
Technical SEO | | cttgroup0 -
Get iTunes SERP
Hi, I would really like to understnad how iTunes (apple) get those great serp in google (see attached image). In addition to the image (which is the main thing I would love to get) they have price, rating and votes numbers. Currently I know about the schema.org tags that can help implement that but I couldn't find this in the source code of the page. Thanx itunes-serp 20120102-gmpk3ih8492w52tbsrrpy573w4.jpg
Technical SEO | | WixSeoTeam0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0