Robots.txt

Rong

I have a page used for a reference that lists 150 links to blog articles. I use in in a training area of my website. I now get warnings from moz that it has too many links. I decided to disallow this page in robots.text. Below is the what appears in the file.

Robots.txt file for http://www.boxtheorygold.com

User-agent: *

Disallow: /blog-links/

My understanding is that this simply has google bypass the page and not crawl it. However, in Webmaster Tools, I used the Fetch tool to check out a couple of my blog articles. One returned an expected result. The other returned a result of "access denied" due to robots.text. Both blog article links are listed on the /blog/links/ reference page.

Question: Why does google refuse to crawl the one article (using the Fetch tool) when it is not referenced at all in the robots.text file. Why is access denied? Should I have used a noindex on this page instead of robots.txt? I am fearful that robots.text may be blocking many of my blog articles. Please advise.

Thanks,
Ron

OlegKorneitchouk

User-agent: *
Disallow: /blog-links/

Will prevent spiders from crawling/indexing content that is located within that specific subfolder. If your articles are not located within that folder, then they should not be blocked. Maybe check for for meta noindex tags on the actual articles? You should also keep an eye on the "Blocked URLs" page in GWT to see if there are pages being blocked that shouldn't be.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt

Robots.txt file for http://www.boxtheorygold.com

Browse Questions

Explore more categories

Related Questions

Meta Robots query

Htaccess and robots.txt and 902 error

Website blocked by Robots.txt in OSE

Do the SEOmoz Campaign Reports follow Robots.txt?

Does Rogerbot respect the robots.txt file for wildcards?

Rogerbot Ignoring Robots.txt?

How to get rid of the message "Search Engine blocked by robots.txt"

Blocking all robots except rogerbot