"Extremely high number of URLs" warning for robots.txt blocked pages

EhrenReilly

I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:

http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets

This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.

User-agent: *

Disallow: /trackingredirect

However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.

If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?

KristinaKledzik

Awesome, good to know things are all okay!

EhrenReilly

Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.

KristinaKledzik

And everything looks okay in your GWT?

EhrenReilly

This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.

KristinaKledzik

Hi Ehren,

Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.

What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.

Let me know if this is a bigger problem!

Kristina

EhrenReilly

Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.

FedeEinhorn

There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.

You can mark that in your Webmaster Tools as fixed and Google won't notify you again.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

"Extremely high number of URLs" warning for robots.txt blocked pages

Browse Questions

Explore more categories

Related Questions

I have two robots.txt pages for www and non-www version. Will that be a problem?

Robots.txt in subfolders and hreflang issues

Robot.txt : How to block a specific file type in several subdirectories ?

Can anyone help me understand why google is "Not Selecting" a large number of my webpages to include when crawling my site.

BEST Wordpress Robots.txt Sitemap Practice??

Robots.txt file getting a 500 error - is this a problem?

Robot.txt pattern matching

"To keyword or not to keyword" in the URL string?