"Extremely high number of URLs" warning for robots.txt blocked pages
-
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:
http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets
This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.
User-agent: *
Disallow: /trackingredirect
However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.
If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
-
Awesome, good to know things are all okay!
-
Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.
-
And everything looks okay in your GWT?
-
This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.
-
Hi Ehren,
Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.
What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.
Let me know if this is a bigger problem!
Kristina
-
Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.
-
There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.
You can mark that in your Webmaster Tools as fixed and Google won't notify you again.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How handle pages with "read more" text query strings?
My site has hundreds of keyword content landing pages that contain one or two sections of "read more" text that work by calling the page and changing a ChangeReadMore variable. This causes the page to currently get indexed 5 times (see examples below plus two more with anchor tag set to #sectionReadMore2 This causes Google to include the first version of the page which is the canonical version and exclude the other 4 versions of the page. Google search console says my site has 4.93K valid pages and 13.8K excluded pages. My questions are: 1. Does having a lot of excluded pages which are all copies of included pages hurt my domain authority or otherwise hurt my SEO efforts? 2. Should I add a rel="nofollow" attribute to the read more link? If I do this will Google reduce the number of excluded pages? 3. Should I instead add logic so the canonical tag displays the exact URL each time the page re-displays in another readmore mode? I assume this would increase my "included pages" and decrease the number of "excluded pages". Would this somehow help my SEO efforts? EXAMPLE LINKS https://www.tpxonline.com/Marketplace/Used-AB-Dick-Presses-For-Sale.asp https://www.tpxonline.com/Marketplace/Used-AB-Dick-Presses-For-Sale.asp?ChangeReadMore=More#sectionReadMore1 https://www.tpxonline.com/Marketplace/Used-AB-Dick-Presses-For-Sale.asp?ChangeReadMore=Less#sectionReadMore1
Technical SEO | | DougHartline0 -
What is the best tool for getting a SiteMap url for a website with over 4k pages?
I have just migrated my website from HUGO to Wordpress and I want to submit the sitemap to Google Search Console (because I haven't done so in a couple years). It looks like there are many tools for getting a sitemap file built. But I think they probably vary in quality. Especially considering the size of my site.
Technical SEO | | DanKellyCockroach2 -
URL Structure for Product Pages
Hi Moz Community. I'm in need of some URL structure advice for product pages. We currently have ~4,000+ products and I'm trying to determine whether I need a new URL structure from the previous site owners. There are two current product URL structures that exist in our website: 1.http://www.example.com/bracelets/gold-bracelets/1-1-10-ct-diamond-tw-slip-on-bangle-14k-pink-gold-gh-i1-i2/ (old URL structure)
Technical SEO | | IceIcebaby
2. http://www.example.com/gemstone-bracelet-prd-bcy-121189/ (new URL structure) The problem is that half of our products are still in the old structure (no one moved them forward), but at the same time I'm not sure if the new structure is optimized as much as possible. Every single gemstone bracelet, or whatever product will have the same url structure, only being unique with the product number at the end. Would it be better to change everything over to more product specific URLS. I.e. example.com/topaz-gemstone-dangle-bracelet. Thanks for your help!
-Reed0 -
Target="_blank"
Do href links that leave a site and use target="_blank" to open a new tab impact SEO?
Technical SEO | | ChristopherGlaeser0 -
Home page URL
Hi, I work on this site: http://www.towerhousetraining.co.uk/about-us. This is the home page URL. Should this be 301'd to: http://www.towerhousetraining.co.uk? I have created a site map, which I submitted to Google Webmaster Tools, which includes these URL's: /about-us, /training-we-offer & /contact-us. There are a total of 3 pages on the website. Webmaster tools has only indexed 2 out of 3 pages. I think this is something to do with the /about-us URL, as when I do a site: search, these pages appear: www.towerhousetraining.co.uk/, /training-we-offer & /contact-us. I am not sure why Google has indexed the home page as www.towerhousetraining.co.uk/ and not /about-us? Is it a bad idea in general not to have your homepage as your root domain? I added a to the homepage, but am wondering if this was the right thing to do? Any help would be appreciated.
Technical SEO | | CWseo0 -
Do I need robots.txt and meta robots?
If I can manage to tell crawlers what I do and don't want them to crawl for my whole site via my robots.txt file, do I still need meta robots instructions?
Technical SEO | | Nola5040 -
Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site. One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK? Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search. Any help appreciated!
Technical SEO | | anilababla0 -
Should search pages be disallowed in robots.txt?
The SEOmoz crawler picks up "search" pages on a site as having duplicate page titles, which of course they do. Does that mean I should put a "Disallow: /search" tag in my robots.txt? When I put the URL's into Google, they aren't coming up in any SERPS, so I would assume everything's ok. I try to abide by the SEOmoz crawl errors as much as possible, that's why I'm asking. Any thoughts would be helpful. Thanks!
Technical SEO | | MichaelWeisbaum0