What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
-
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT.
I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap.
Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
-
No, but they might write to it, modify it, or do all sorts of other nasty stuff I've seen hackers do when they get a hold of any writeable file on a system.
-
lol it's a robots text file. what are they going to do. Steal it? I should have clarified do a 777 to make sure that is not your problem, then yes change the permission to be tighter
-
Eesh I don't recommend 777. 644 or, if you're going to change it right back, 755 at most.
-
File permission maybe? Change it to 777 and try it again
-
If you have shell access on Linux you can use wget or GET or run lynx.
If google is getting the wrong robots file then your web server must be sending out something other than what you think is the robots file.
What happens if you do this in your browser:
-
Looking in my log files, Google hits robots.txt just about every time it crawls our site.
What are you trying to accomplish using fetch as Googlebot? Any chance CURL could do the job for you, or another tool that ignores robots.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
Bulk URL Removal in Webmaster Tools
One of Wordpress sites was hacked (for about 10 hours), and Google picked up 4000+ urls in the index. The site is fixed, but I'm stuck with all those urls in the index. All the urls of of the form: walkerorthodontics.com/index.php?online-payday-cash-loan.htmloncewe The only bulk removal option I could find was to remove an entire folder, but I can't do that, as it would only leave the homepage and kill off everything else. For some crazy reason, the removal tools doesn't support wildcards, so that obvious solution is right out. So, how do it get rid of 4000 results? And no, waiting around for them to 404 out of the index isn't an option.
Technical SEO | | MichaelGregory0 -
Site address change: new site isn't showing up in Google, old site is gone.
We just transitioned mccacompanies.com to confluentstrategies.com. The problem is that when I search for the old name, the old website doesn't come up anymore to redirect people to the new site. On the local card, Google has even taken off the website altogether. (I'm currently still trying to gain access to manage the business listing) When I search for confluent strategies, the website doesn't come up at all. But if I use the site: operator, it is in the index. Basically, my client has effectively disappeared off the face of the Google. (In doing other name changes, this has never happened to me before) What can I do?
Technical SEO | | MichaelGregory0 -
Switching from HTTP to HTTPS and google webmaster
HI, I've recently moved one of my sites www.thegoldregister.co.uk to https. I'm using wordpress and put in the permanent 301 redirect for all pages to false https for all pages in the htaaccess file. I've updated the settings in google analytics to https for the original site. All seems to be working well. Regarding the google webmaster tools and what needs to be done. I'm very confused by the google documentation on this subject around https. Does all my crawl data and indexing from http site still stand and be inherited by the https version because of the redirects in place. I'm really worried I will lose all of this indexing data, I looked at the "change of address" in the settings of webmaster, but this seems to refer to changing the actual domain name rather than the protocol which i haven't at all. I've also tried adding the https version to the console as well, but the https version is showing a severe warning "is robots.txt blocking some important pages". I don't understand this error as it's the same version and file as the http site being generated by all in one seo pack for wordpress (see below at bottom). The warning is against line 5 saying it will ignore it. What i don't understand is i don't get this error in the webmaster console with the http version which is the same file?? Any help and advice would be much appreciated. Kind regards Steve User-agent: *
Technical SEO | | lqz
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Crawl-delay: 10 ceLAHIv.jpg0 -
Robots.txt & Mobile Site
Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Google Webmaster Tools Reporting False Links
I was looking at Google Webmaster Tools and the amount of links that are reported in there are inaccurate. They reported over 50,000 links that created a huge spike in their link graph and I checked some of the links and they don't even have the link on their site. Can anyone help with this?
Technical SEO | | TopFloor0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10