Our crawler was not able to access the robots.txt file on your site

tigersohelll

Hello Mozzers!

I've received an error message saying the site can't be crawled because Moz is unable to access the robots.txt. I've spoken to the webmaster and he can't understand why the robot.txt can't be accessed in Moz.

https://www.thefurnshop.co.uk/robots.txt

and Google isn't flagging anything up to us.

Does anyone know how to solve this problem?

Thanks

jaytarr

@LoganRay This was our issue. Didn't know Moz tries to retrieve the HTTP robots.txt first. Our HTTPS redirect was not working on static files only, so the HTTP path to the robots.txt was failing. We did not notice it because the HSTS policy was forcing the browser to redirect.

LoganRay

Wanted to jump back in on this topic as I've just confirmed my initial suspicion.

I just added a new client to our Moz account and had the exact same issue, crawler unable to access the robots.txt file. It's a secure site and was configured in Moz without the HTTPS. When I go to the robots.txt file without https://www, it redirects to the same thing as yours where the / between the TLD and page path gets removed.

Reconfigure your site and it should begin to work.

Tenlo

There are 2 parts of your robots.txt that could be causing this, and it all just depends on how each bot is reading regular expressions in your robots.txt:

First, your Disallow: /? can be read as Disallow all paths starting with "/" with 0 to infinity characters "" and one character "?". Try replacing this part with Disallow: /*? to make it not crawl anything with a query string (which is what I believe you were going for).

Second, you have a open Disallow followed by the User-agent: rogerbot and while this should not be read this way, once again it all depends on how each bot reads the commands. To fix this you should change your Disallow following your Googlebot-Image as Disallow: /

LoganRay

Hi there,

There's something odd going on when I try to access your robots.txt file without the www. The www gets added back on, but when it does, the slash between the TLD and page path gets deleted, see below. I'm guessing your domain in Moz is configured without the www, which means RogerBot is getting redirected to this slash-less version of the file.

https://www.thefurnshop.co.ukrobots.txt

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Our crawler was not able to access the robots.txt file on your site

Browse Questions

Explore more categories

Related Questions

Unsolved What would the exact text be for robots.txt to stop Moz crawling a subdomain?

I want to increase the DA of my Site

Moz site crawl doesn't work

Why is Moz unable to crawl my site?

How to have MOZ site crawl pre-launch

What Moz tool is best to find reasons google has not spidered by site

New to using MOZ. Familiar with Google Analytics. With MOZ is there a code snippet to include on my site?

How do get Moz to spider a Development site PRE LAUNCH?