Crawl budget
-
I am a believer in this concept, showing google less pages will increase their importance.
here is my question: I manage a website with millions of pages, high organic traffic (lower than before).
I do believe that too many pages are crawled. there are pages that I do not need google to crawl and followed. noindex follow does not save on the mentioned crawl budget. deleting those pages is not possible. any advice will be appreciated. If I disallow those pages I am missing on pages that help my important pages.
-
I just wrote a better reply so forgive me the file was deleted by hit post. But I do think this information will help you get acquainted much better with a crawl budget
One thing that may not incorporate your domain however is that is not centered a external link is a content delivery or CDN URL that is not rewritten to the traditional CDN.example.com Content delivery network URLs can look like
- https://example.cachefly.net/images/i-love-cats.jpg
- https://example.global.prod.fastly.net/images/i-love-cats-more.png
So you may see URLs like one shown here should be considered part of your domain and treated like a domain even if they look like this but these are just two examples of literally thousands CDN variants.
Examples for this would be the cost of incorporating your hostname to a HTTPS encrypted content delivery network can be very expensive.
Crawl budget information
- http://www.stateofdigital.com/guide-crawling-indexing-ranking/
- https://www.deepcrawl.com/case-studies/elephate-fixing-deep-indexation-issues/
- https://builtvisible.com/log-file-analysis/
- https://www.deepcrawl.com/knowledge/best-practice/the-seo-ruler/ ( image below)
Tools for Crawl budget
I hope is more helpful,
Thomas
-
Nope, having more or less external links will have no effect on the crawl budget spent on your site. Your crawl budget is only spent on yourdomain.com. I just meant that, Google's crawler will follow external links, but it won't until it's spent its crawl budget on your site.
Google isn't going to give you a metric showing your crawl budget, but you can assume how much it is by going to Google Search Console > Crawl > Crawl Stats. That'll show you how many pages Googlebot has crawled per day.
-
I see, thank you.
I'm guessing if we have these external links across the site, it'll cause more harm than good for us if they use up some crawl budget on every page?
Is there anyway to find out what the crawl budget is?
-
To be clear, Googlebot will find those external links and leave your site, but not until they've used their crawl budget up on your site.
-
Nope.
-
I'd like to find out if this crawl budget applies to external sites - we link to sister companies in the footer.
Will Google bot find these external links and leave our site to go and crawl these external sites?
Thanks!
-
Using Google's parameter tools you can also reduce crawl budget issues.
https://www.google.com/webmasters/tools/crawl-url-parameters
http://www.blindfiveyearold.com/crawl-optimization
http://searchenginewatch.com/sew/news/2064349/crawl-index-rank-repeat-a-tactical-seo-framework
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
-
What's the background of your conclusion? What's the logic?
I am asking because my understanding about crawl budget is that if you waste it you risk to 1) slow down recrawl frequency on a per page basis and 2) risk google crawler gives up crawling the website before to have crawled all pages.
Are you splitting your sitemap into sub sitemaps? That's a good way to spot groups/categories of pages being ignored by google crawler.
-
If you want to test this, identify a section of your site that you can disallow via robots.txt and then measure the corresponding changes. Proceed section by section based on results. There are so many variables at play that I don't think you'll get an answer that's anywhere as precise as testing your specific situation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?
my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml
Intermediate & Advanced SEO | | morg454540 -
Would spiders successfully crawl a page with two distinct sets of content?
Hello all and thank you in advance for the help. I have a coffee company that sell both retail and wholesale products. These are typically the same product, just at different prices. We are planning on having a pop up for users to help them self identify upon their first visit asking if they are retail or wholesale clients. So if someone clicks retail, the cookie will show them retail pricing throughout the site and vice versa for those that identify themselves as wholesale. I can talk to our programmer to find out how he actually plans on doing this from a technical standpoint if it would be of assistance. My question is, how will a spider crawl this site? I am assuming (probably incorrectly) that whatever the "default" selection is (for example, right now now people see retail pricing and then opt into wholesale) will be the information/pricing that they index. So long story short, how would a spider crawl a page that has two sets of distinct pricing information displayed based on user self identification? Thanks again!
Intermediate & Advanced SEO | | ClayPotCreative0 -
Seomoz crawl reporting thousands of 302 redirects?
Hi Mozzers I need a little help with Magento redirects. We are using the Ultimo theme [1] and our crawl report has discovered several thousand 302 redirects occurring for links such as: Product comparison links and add to basket links which are formatted as “REPLACE THIS WITH THE INITIAL LINK” are rewritten to links such as “REPLACE THIS WITH THE REWRITTEN LINK ”. Is there any way to stop this from happening? Should I change these rewrites to 301 redirects? Or is this perfectly normal? [1] - http://themeforest.net/item/ultimo-fluid-responsive-magento-theme/3231798 Thanks in advance @Anthony_Mac85
Intermediate & Advanced SEO | | Tone_Agency0 -
When crawls occur - when will my links show up in Open Site Explorer
Hello everyone, I've been building links for a while now and none of them show up in Explorer. My domain authority hasn't changed for about a month or so. When does Google do crawls and when does SEOMoz do crawls? Thanks
Intermediate & Advanced SEO | | Harbor_Compliance0 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
SEOmoz is only crawling 2 pages out of my website
I have checked on Google Webmaster and they are crawling around 118 pages our of my website, store.itpreneurs.com but SEOmoz is only crawling 2 pages. Can someone help me? Thanks Diogo
Intermediate & Advanced SEO | | jslusser0 -
Website Crawl problems
I have a feeling that Google doesn't crawl my website. E.g. this blogpost - I copy a sentence from it and paste it to Google. The page that shows up in search results is www.silvamethodlife.com/page/9/ - which is just a blog page with all the articles listed, not the link to the article itself! Did anyone ever have this problem? It's definitely some technical issue. Any advice will be deeply appreciated Thanks
Intermediate & Advanced SEO | | Alexey_mindvalley0 -
Correlation Between Domain Authority and Crawl Penetration?
A. Is there a correlation between domain authority and crawl penetration? B. Is there a correlation between domain authority and juice distribution?
Intermediate & Advanced SEO | | AWCthreads0