Googlebot on steroids... Why?
-
We launched a new website (www.gelderlandgroep.com). The site contains 500 pages, but some pages (like https://www.gelderlandgroep.com/collectie/) contains filters (so there are a lot possible url parameters). Last week we mentioned a tremendous amount of traffic (25 GB!!) and CPU usage on the server.
2017-12-04 16:11:57 W3SVC66 IIS14 83.219.93.171 GET /collectie model=6511,6901,7780,7830,2105-illusion&ontwerper=henk-vos,foklab 443 - 66.249.76.153 HTTP/1.1 Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Build/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - - www.gelderlandgroep.com 200 0 0 9445 501 312
We find out that "Googlebot" was firing many, many requests. At first we did a nslookup for the IPadres where it actually seems to be googlebot.
Second we visited Google Searchconsole and I was really surprised... Googlebot on steroids? Googlebot requested 922.565 different url's and made combinations for every filter/ parameter combination on the site. Why? The sitemap.xml contains 500 url's... The authority of the site isn't very high, no other signal that this is a special website... Why so much "Google resources"?
Of course we will exclude the parameters in SearchConsole, but I never saw a Googlebot activity for a small website like this before! Does anybody have any clue?
Regards Olaf
-
We got an answer from JohnMu - Webmaster Trends Analyst at Google. The reason of crawling is (as we find out) the filters which have infinite variations (one of developers was sleeping), we will correct this. Disallowing in Robot.txt is adviced as the quickest fix to stop the mega-crawling. This case will be used for further research because of the disproportionate capacity usage. You're right, Google initially will crawl everything, but they don't want Googlebot crawling looks like a "mini-Ddos-like attack".
-
Glad to help!
The large volume could well be to do with the way the filters are set up. There is also a possibility you could be sending some sort of authority signal somehow to Google, for instance if it is using the same Search Console as other valued brands or same WHOIS information.
My gut feeling is after the initial crawl the traffic will reduce, if it doesn't, it probably means Google is finding something new to index, may be dynamically created pages?
-
Thanks for your help!
I think you're probably right. The initial crawling must be complete if Google wants to put everything into the right perspective. But we manage en host more than 300 sites, including large A-brand sites. And even at those sites I had not seen this kind of volumes before.
The server logs also show the same amount of request this night (day five). I will keep you posted if this still continues after the weekend.
-
As far as I know, Google will attempt to find every single page it can possibly find regardless of authority. The frequency after the initial crawl will be affected by the site authority, volume and frequency of updates.
Virtually every page on every website that is publicly accessible will be index and rank somewhere, where you rank will be determined by Google ranking factors.
Keep in mind that search console stats will be a few days out of date (2 or 3 days) and it will normally take two or three days to crawl.
-
Mmm, is that correct? I thought that the amount of resources Google will put in crawling your (new) website also depends of it's authority. 9 million url's, for four days now... It seems to bee so much for this small website...
-
I would say your filters are creating pages in their own right, or at least as Google bot sees it. I have seen a similar thing happen on a site redesign. Potentially, if you can access each filter with a URL that could be listed as an individual page, assuming the content is different.
The first time Google crawls your site, it will try to find everything it possibly can to put it in the index, Google will eat data like no tomorrow
At this stage I wouldn't be too worried about it, just keep an eye out for duplicate content. I guess you'll see both graphs dipped down again to normal levels within a few days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content Rendering by Googlebot vs. Visitor
Hi Moz! After a different question on here, I tried fetching as Google to see the difference between bot & user - to see if Google finds the written content on my page The 2 versions are quite different - with Googlebot not even rendering product listings or content, just seems to be the info in the top navigation - guessing this is a massive issue? Help Becky
Intermediate & Advanced SEO | | BeckyKey0 -
Can Googlebots read canonical tags on pages with javascript redirects?
Hi Moz! We have old locations pages that we can't redirect to the new ones because they have AJAX. To preserve pagerank, we are putting canonical tags on the old location pages. Will Googlebots still read these canonical tags if the pages have a javascript redirect? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.
An example URL can be found here: http://symptom.healthline.com/symptomsearch?addterm=Neck%20pain&addterm=Face&addterm=Fatigue&addterm=Shortness%20Of%20Breath A couple of questions: Why is Google reporting an issue with these URLs if they are marked as noindex? What is the best way to fix the issue? Thanks in advance.
Intermediate & Advanced SEO | | nicole.healthline0 -
Best way to view Global Navigation bar from GoogleBot's perspective
Hi, Links in the global navigation bar of our website do not show up when we look at Google cache --> text only version of the page. These links use "style="<a class="attribute-value">display:none;</a>" when we looked at HTML source. But if I use "user agent switcher" add-on in Firefox and set it to Googlebot, the links in global nav are displayed. I am wondering what is the best way to find out if Google can/can not see the links. Thanks for the help! Supriya.
Intermediate & Advanced SEO | | SShiyekar0 -
Fetch as Googlebot
"With Fetch as Googlebot you can see exactly how a page appears to Google" I have verified the site and clicked on Fetch button. But how can i "see exactly how a page appears to Google" Thanks
Intermediate & Advanced SEO | | seoug_20050 -
Why specify robots instead of googlebot for a Panda affected site?
Daniweb is the poster child for sites that have recovered from Panda. I know one strategy she mentioned was de-indexing all of her tagged content, fo rexample: http://www.daniweb.com/tags/database Why do you think more Panda affected sites specifying 'googlebot' rather than 'robots' to capture traffic from Bing & Yahoo?
Intermediate & Advanced SEO | | nicole.healthline0 -
How to find what Googlebot actually sees on a page?
1. When I disable java-script in Firefox and load our home page, it is missing entire middle section. 2. Also, the global nav dropdown menu does not display at all. (with java-script disabled) I believe this is not good. 3. But when type in <website name="">in Google search and click on the cached version of home page > and then click on text only version, It displays the Global nav links fine.</website> 4. When I switch the user agent to Googlebot(using Firefox plugin "User Agent Swticher)), the home page and global nav displays fine. Should I be worried about#1 and #2 then? How to find what Googlebot actually sees on a page? (I have tried "Fetch as Googlebot" from GWT. It displays source code.) Thanks for the help! Supriya.
Intermediate & Advanced SEO | | Amjath0 -
Googlebot + Meta-Refresh
Quick question, can Googlebot (or other search engines) follow meta refresh tags? Does it work anything like a 301 in terms of passing value to the new page?
Intermediate & Advanced SEO | | kchandler1