Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I got an 803 error yesterday on the Moz crawl for most of my pages. The page loads normally in the browser. We are hosted on shopify
I got an 803 error yesterday on the Moz crawl for most of my pages. The page loads normally in the browser. We are hosted on shopify, the url is www.solester.com please help us out
Moz Pro | | vasishta0 -
Can someone kindly explain what 'Crawl Issue Found: No rel="canonical" Tags' means? Is this a critical error and how can it be rectified?
Can someone kindly explain what 'Crawl Issue Found: No rel="canonical" Tags' means? Is this a critical error and how can it be rectified?
Moz Pro | | JoshMcLean0 -
Moz & Xenu Link Sleuth unable to crawl a website (403 error)
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this) Moz Result Title 403 : Error Meta Description 403 Forbidden Meta Robots_Not present/empty_ Meta Refresh_Not present/empty_ Xenu Link Sleuth Result Broken links, ordered by link: error code: 403 (forbidden request), linked from page(s): Thanks in advance!
Moz Pro | | ZaddleMarketing0 -
Ajax4SEO and rogerbot crawling
Has anyone had any experience with seo4ajax.com and moz? The idea is that it points a bot to a html version of an ajax page (sounds good) without the need for ugly urls. However, I don't know how this will work with rogerbot and whether moz can crawl this. There's a section to add in specific user agents and I've added "rogerbot". Does anyone know if this will work or not? Otherwise, it's going to create some complications. I can't currently check as the site is in development and the dev version is noindexed currently. Thanks!
Moz Pro | | LeahHutcheon0 -
Hoe to crawl specific subfolders
I tried to create a campaign to crawl the subfolders of my site, but it stops at just 1 folder. Basically what I want to do is crawl everything after folder1: www.domain.com/web/folder1/* I tried to create 2 campaigns: Subfolder Campaign 1: www.domain.com/web/folder1/*
Moz Pro | | gofluent
Subfolder Campaign 2: www.domain.com/web/folder1/ In both cases, it did not crawl and folders after the last /. Can you help me ?0 -
Pages Crawled: 1 Why?
I have some campaigns which have only 1 page crawled, while some other campaigns, having completely similar URL (subdomain) and number of keywords and pages, have all pages crawled... Why is that so? It has been also a while I waited and so far no change...
Moz Pro | | BritishCouncil0 -
Can you set-up a manual SEOmoz crawl?
I received a crawl report yesterday, made some site changes, and would like to see if those changes were done correctly. Rather than wait a week for my automatic crawl to be generated, is there anyway to initiate a manual crawl on a single subdomain as a PRO member? As a PRO member, you can schedule crawls for 2 subdomains every 24 hours, and you'll get up to 3,000 pages crawled per subdomain. When we've finished crawling, your reports will be sent to your PRO email address, which is currently From here... http://pro.seomoz.org/tools/crawl-test
Moz Pro | | ICM0 -
Crawl still in progress ...
Hi guys, New crawl on one of my campaigns is still in progress since November 27th, i didn't get new data since November 19th 2011 ... What should i do ?
Moz Pro | | DavidEichholtzer0