"Extremely high number of URLs" warning for robots.txt blocked pages
-
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:
http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets
This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.
User-agent: *
Disallow: /trackingredirect
However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.
If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
-
Awesome, good to know things are all okay!
-
Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.
-
And everything looks okay in your GWT?
-
This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.
-
Hi Ehren,
Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.
What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.
Let me know if this is a bigger problem!
Kristina
-
Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.
-
There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.
You can mark that in your Webmaster Tools as fixed and Google won't notify you again.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Extreme high number of pages found on webshop
Hi, Im working for the first time on a magento webshop. But i run into a problem where crawlers find then thousands of pages while there are a few hunderd products. I expect is has something to do with filters that generate dynamic URL's. I can't find any setting in Magento to prevent this and i think this will hurt SEO performance because of duplicate content and high amount of pages that need to be crawled while the site has no authority. What would my approach be to solve this? Do i need to ad certain tags to the pages or are these settings in my robots file.
Technical SEO | | J05B0 -
Removing a large number of unnecessary pages from a site
Hi all, I got a big problem with my website. I have a lot of page, duplicate page made from various combinations of selects, and for all this duplicate content we've be hit by a panda update 2 years ago. I don't want to bring new content an all of these pages, about 3.000.000, because most of them are unnecessary. Google indexed all of them (3.000.000), and I want to redirect the pages that I don't need anymore to the most important ones. My question, is there any problem in how google will see this change, because after this it will remain only 5000-6000 relevant pages?
Technical SEO | | Silviu0 -
Is it easier to rank high with a front page than a landing page?
My product is laptop and of cause, I like to rank high for the keyword "laptop". Do any of you know if the search engines tends to rank a front page higher than a landing page? Eg. www.brand.com vs. www.brand.com/laptop
Technical SEO | | Debitoor0 -
Can i use "nofollow" tag on product page (duplicated content)?
Hi, im working on my webstore SEO. I got descriptions from official seller like "Bosch". I got more than 15.000 items so i cant create unique content for each product. Can i use nofollow tag for each product and create great content on category pages? I dont wanna lose rankings because duplicated content. Thank you for help!
Technical SEO | | pejtupizdo0 -
Blocking Test Pages Enmasse on Sub-domain
Hello, We have thousands of test pages on a sub-domain of our site. Unfortunately at some point, these pages were visible to search engines and got indexed. Subsequently, we made a change to the robots.txt file for the test sub-domain. Gradually, over a period of a few weeks, the impressions and clicks as reported by Google Webmaster Tools fell off for the test. sub-domain. We are not able to implement the no index tag in the head section of the pages given the limitations of our CMS. Would blocking off Google bot via the firewall enmasse for all the test pages have any negative consequences for the main domain that houses the real live content for our sites (which we would like to of course remain in the Google index). Many thanks
Technical SEO | | CeeC-Blogger0 -
Have I constructed my robots.txt file correctly for sitemap autodiscovery?
Hi, Here is my sitemap: User-agent: * Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml Directories Disallow: /sendfriend/
Technical SEO | | Bedsite
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.html I'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery? thanks,0 -
How do I 301 redirect a number of pages to one page
I want to redirect all pages in /folder_A /folder_B to /folder_A/index.php. Can I just write one or two lines of code to .htaccess to do that?
Technical SEO | | Heydarian0 -
What to do about "blocked by meta-robots"?
The crawl report tells me "Notices are interesting facts about your pages we found while crawling". One of these interesting facts is that my blog archives are "blocked by meta robots". Articles are not blocked, just the archives. What is a "meta" robot? I think its just normal (since the article need only be crawled once) but want a second opinion. Should I care about this?
Technical SEO | | GPN0