Crawl budget
-
I am a believer in this concept, showing google less pages will increase their importance.
here is my question: I manage a website with millions of pages, high organic traffic (lower than before).
I do believe that too many pages are crawled. there are pages that I do not need google to crawl and followed. noindex follow does not save on the mentioned crawl budget. deleting those pages is not possible. any advice will be appreciated. If I disallow those pages I am missing on pages that help my important pages.
-
I just wrote a better reply so forgive me the file was deleted by hit post. But I do think this information will help you get acquainted much better with a crawl budget
One thing that may not incorporate your domain however is that is not centered a external link is a content delivery or CDN URL that is not rewritten to the traditional CDN.example.com Content delivery network URLs can look like
- https://example.cachefly.net/images/i-love-cats.jpg
- https://example.global.prod.fastly.net/images/i-love-cats-more.png
So you may see URLs like one shown here should be considered part of your domain and treated like a domain even if they look like this but these are just two examples of literally thousands CDN variants.
Examples for this would be the cost of incorporating your hostname to a HTTPS encrypted content delivery network can be very expensive.
Crawl budget information
- http://www.stateofdigital.com/guide-crawling-indexing-ranking/
- https://www.deepcrawl.com/case-studies/elephate-fixing-deep-indexation-issues/
- https://builtvisible.com/log-file-analysis/
- https://www.deepcrawl.com/knowledge/best-practice/the-seo-ruler/ ( image below)
Tools for Crawl budget
I hope is more helpful,
Thomas
-
Nope, having more or less external links will have no effect on the crawl budget spent on your site. Your crawl budget is only spent on yourdomain.com. I just meant that, Google's crawler will follow external links, but it won't until it's spent its crawl budget on your site.
Google isn't going to give you a metric showing your crawl budget, but you can assume how much it is by going to Google Search Console > Crawl > Crawl Stats. That'll show you how many pages Googlebot has crawled per day.
-
I see, thank you.
I'm guessing if we have these external links across the site, it'll cause more harm than good for us if they use up some crawl budget on every page?
Is there anyway to find out what the crawl budget is?
-
To be clear, Googlebot will find those external links and leave your site, but not until they've used their crawl budget up on your site.
-
Nope.
-
I'd like to find out if this crawl budget applies to external sites - we link to sister companies in the footer.
Will Google bot find these external links and leave our site to go and crawl these external sites?
Thanks!
-
Using Google's parameter tools you can also reduce crawl budget issues.
https://www.google.com/webmasters/tools/crawl-url-parameters
http://www.blindfiveyearold.com/crawl-optimization
http://searchenginewatch.com/sew/news/2064349/crawl-index-rank-repeat-a-tactical-seo-framework
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
-
What's the background of your conclusion? What's the logic?
I am asking because my understanding about crawl budget is that if you waste it you risk to 1) slow down recrawl frequency on a per page basis and 2) risk google crawler gives up crawling the website before to have crawled all pages.
Are you splitting your sitemap into sub sitemaps? That's a good way to spot groups/categories of pages being ignored by google crawler.
-
If you want to test this, identify a section of your site that you can disallow via robots.txt and then measure the corresponding changes. Proceed section by section based on results. There are so many variables at play that I don't think you'll get an answer that's anywhere as precise as testing your specific situation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I stop my facets being crawled?
Hi If my facets are being crawled, how can I stop this? Or set them up so they are SEO friendly - this is new to me as I haven't had to deal with lots of facets in the past. Here's an example of a page on the site - https://www.key.co.uk/en/key/lift-tables Here's an example of a facet URL - https://www.key.co.uk/en/key/lift-tables#facet:-1002779711011711697110,-700000000000001001651484832107103,-700000000000001057452564832109109&productBeginIndex:0&orderBy:5&pageView:list& I've been trying to read up on URL parameters etc, I'm new to it so it's taking some time to understand Any advice would be great!
Intermediate & Advanced SEO | | BeckyKey0 -
Can spiders crawl javascript navigation now?
I was reading Danny Dover's book and decided to try some websites and so far everyone I have looked at has had navigation that does not work with disabled javascript. Is this still as important as it was at the time of publish (2011)? Thanks!
Intermediate & Advanced SEO | | Sika220 -
Wordpress to HubSpot CMS - I had major crawl issues post launch and now traffic is down 400%
Hi there good looking person! Our traffic went from 12k visitors in july to 3k visitors in july. << www.thedsmgroup.com >>When we moved our site from wordpress to the hubspot COS (their CMS system), I didnt submit a new sitemap to google webmaster tools. I didn't know that I had to... and to be honest, I've never submitted or re-submitted a sitemap to GWT. I have always built clean sites with fresh content and good internal linking and never worried about it. Yoast kind of took care of the rest, as all of my sites and our clients' sites were always on wordpress. Well, lesson learned. I got this message on June 27th in GWT_http://www.thedsmgroup.com/: Increase in not found errors__Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages._One month after our site launched we had 1,000 404s on our website. Ouch. Google thought we had a 1,200 page website with only 200 good pages and 1,000 error pages. Not very trust worthy... We never had a 404 ever before this, as we added a plugin to wordpress that would 301 any 404 to the homepage, so we never had a broken link on our site, which is not ideal for UX, but as far as google was concerned, our site was always clean. Obviously I have submitted a new sitemap to GWT a few weeks ago, and we are moving in the right direction... **but have I taken care of everything I need to? I'm not sure. Our traffic is still around 100 visitors per day, not 400 per day as it was before we launched the new site.**Thoughts?I'm not totally freaking out or anything, but a month ago we ranked #1 and #2 for "marketing agency nj", now we aren't in the top 100. I've never had a problem like this. _I added a few screen grabs from Google Webmaster Tools that should be helpful.__Bottom line, have I done everything I need to or do I need to do something with all of these "not found" error details that I have in GWT?_None of these "not found" pages have any value and I'm not sure how Google even found them... For example: http://www.thedsmgroup.com/supersize-page-test/screen-shot-2012-11-06-at-2-33-22-pmHelp! -JasonuhLLtou&h4QmGCW#0 uhLLtou&h4QmGCW#1
Intermediate & Advanced SEO | | Charlene-Wingfield0 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Crawling issue
Hello, I am working on 3 weeks old new Magento website. On GWT, under index status >advanced, I can only see 1 crawl on the 4th day of launching and I don't see any numbers for indexed or blocked status. | Total indexed | Ever crawled | Blocked by robots | Removed |
Intermediate & Advanced SEO | | sedamiran
| 0 | 1 | 0 | 0 | I can see the traffic on Google Analytic and i can see the website on SERPS when i search for some of the keywords, i can see the links appear on Google but i don't see any numbers on GWT.. As far as I check there is no 'no index' or robot block issue but Google doesn't crawl the website for some reason. Any ideas why i cannot see any numbers for indexed or crawled status on GWT? Thanks Seda | | | | |
| | | | |0 -
Duplicate site (disaster recovery) being crawled and creating two indexed search results
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain. Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm. There seem to be two potential fixes. Which is best for this case? use the robots.txt to block Google from crawling the .gtm site 2) canonicalize the the gtm urls to toptable.co.uk In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best. Thanks in advance to the SEOmoz community!
Intermediate & Advanced SEO | | OpenTable0 -
Stop Google crawling a site at set times
Hi All I know I can use robots.txt to block Google from pages on my site but is there a way to stop Google crawling my site at set times of the day? Or to request that they crawl at other times? Thanks Sean
Intermediate & Advanced SEO | | ske110 -
Does Google crawl the pages which are generated via the site's search box queries?
For example, if I search for an 'x' item in a site's search box and if the site displays a list of results based on the query, would that page be crawled? I am asking this question because this would be a URL that is non existent on the site and hence am confused as to whether Google bots would be able to find it.
Intermediate & Advanced SEO | | pulseseo0