Our crawler was not able to access the robots.txt file on your site
-
Hello Mozzers!
I've received an error message saying the site can't be crawled because Moz is unable to access the robots.txt. I've spoken to the webmaster and he can't understand why the robot.txt can't be accessed in Moz.
https://www.thefurnshop.co.uk/robots.txt
and Google isn't flagging anything up to us.
Does anyone know how to solve this problem?
Thanks
-
@LoganRay This was our issue. Didn't know Moz tries to retrieve the HTTP robots.txt first. Our HTTPS redirect was not working on static files only, so the HTTP path to the robots.txt was failing. We did not notice it because the HSTS policy was forcing the browser to redirect.
-
Wanted to jump back in on this topic as I've just confirmed my initial suspicion.
I just added a new client to our Moz account and had the exact same issue, crawler unable to access the robots.txt file. It's a secure site and was configured in Moz without the HTTPS. When I go to the robots.txt file without https://www, it redirects to the same thing as yours where the / between the TLD and page path gets removed.
Reconfigure your site and it should begin to work.
-
There are 2 parts of your robots.txt that could be causing this, and it all just depends on how each bot is reading regular expressions in your robots.txt:
First, your Disallow: /? can be read as Disallow all paths starting with "/" with 0 to infinity characters "" and one character "?". Try replacing this part with Disallow: /*? to make it not crawl anything with a query string (which is what I believe you were going for).
Second, you have a open Disallow followed by the User-agent: rogerbot and while this should not be read this way, once again it all depends on how each bot reads the commands. To fix this you should change your Disallow following your Googlebot-Image as Disallow: /
-
Hi there,
There's something odd going on when I try to access your robots.txt file without the www. The www gets added back on, but when it does, the slash between the TLD and page path gets deleted, see below. I'm guessing your domain in Moz is configured without the www, which means RogerBot is getting redirected to this slash-less version of the file.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to increase the DA of my Site
Dear all i have established a new site and want to increase its DA fast. So i need your suggestions in this regard. This newly established blog is my first independant project so i want to make it perfect.
Getting Started | | hamza52520 -
Why does Moz only seem to be crawling a snap shot of the site I am working with?
I was wondering if anyone can help? I am working using Moz to help improve the SEO on a website I am working with, the website contains thousands of pages, yet for some reason Moz only seems to be crawling a small snap shot of the website. I know there are particular pages that I had added a couple of weeks ago - about 300 in total - and none of these were showing on the first crawl, so I did another on-demand crawl and some of these showed up then. Despite this, it says it crawled 700ish pages, but there are getting close to 20-30ish thousand live pages on the site. Any thoughts and guidance as to why they crawling may be stopping?
Getting Started | | dsmith8020200 -
Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/ To resolve this we have set up a disallow statement in the robots.txt file that says
Getting Started | | btreloar
Disallow: /page/ For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?0 -
How Do I Scan My New Site & Grade My Work With The Robots Turned Off? For Pre-Inspection before I launch my Site?
I have a new site that has all the bots turned off so google can't index my site until I'm finished it. I've been working on this site for a couple months now optimizing and I was wondering if there was anyway I can run a preliminary scan on the site for my titles, URLs, Headers, Alt Tags and pretty much anything else that will grade my work and tell me if i did anything wrong? Can MOZ do this with the Bots turned off? Thanks
Getting Started | | Inframan0 -
Not able to access Moz pro page and my campaigns
Hi, I could not able to access my campaign pages (pro home page). I received the below issue while logged into my account. This webpage has a redirect loop The webpage at http://moz.com/pro/home has resulted in too many redirects. Clearing your cookies for this site or allowing third-party cookies may fix the problem. If not, it is possibly a server configuration issue and not a problem with your computer. Learn more about this problem. Error code: ERR_TOO_MANY_REDIRECTS I have checked on both firefox,chrome browsers after cleared the history and cookies. Also checked with different ISP providers. But received the same. Has anyone experienced the same? Plz help.
Getting Started | | HCSEO510 -
Hi, I'm looking to find out why a google+ account that was rarely used has 10,000 views. I want to discover what sites it is linked to. I entered the page url but no joy. can anyone help?
I would like to find out where all this traffic is coming from. It is most likely from an out of date sales site etc, but it's important to find out as it could be the result of hacking etc. It appears the page is linked to another site and I would like to find out which one(s) Entering the page url is not getting results, can anyone help?
Getting Started | | cyganswenia0 -
Whenever I try to access campaigns in moz pro I get an error page
I recently signed-up for a new pro account. As I was adding my first subdomain everything was fine until I was asked to link to GA, when I clicked yes I got this error message: 403 Forbidden Now every time I click on set-up campaign I get taken to a page with nothing but the 403 Forbidden text.
Getting Started | | Toptal0