"Extremely high number of URLs" warning for robots.txt blocked pages

EhrenReilly

I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:

http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets

This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.

User-agent: *

Disallow: /trackingredirect

However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.

If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?

KristinaKledzik

Awesome, good to know things are all okay!

EhrenReilly

Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.

KristinaKledzik

And everything looks okay in your GWT?

EhrenReilly

This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.

KristinaKledzik

Hi Ehren,

Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.

What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.

Let me know if this is a bigger problem!

Kristina

EhrenReilly

Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.

FedeEinhorn

There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.

You can mark that in your Webmaster Tools as fixed and Google won't notify you again.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

"Extremely high number of URLs" warning for robots.txt blocked pages

Browse Questions

Explore more categories

Related Questions

Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.

Assistance with High Priority Duplicate Page Content Errors

Is it easier to rank high with a front page than a landing page?

How can I change the page title "two" (artigos/page/2.html) in each category ?

Can I rely on just robots.txt

Are you allowed to point different urls to same page

OK to block /js/ folder using robots.txt?

We are still seeing duplicate content on SEOmoz even though we have marked those pages as "noindex, follow." Any ideas why?