Crawl efficiency - Page indexed after one minute!
-
Hey Guys,A site that has 5+ million pages indexed and 300 new pages a day.I hear a lot that sites at this level its all about efficient crawlabitliy.The pages of this site gets indexed one minute after the page is online.1) Does this mean that the site is already crawling efficient and there is not much else to do about it?2) By increasing crawlability efficiency, should I expect gogole to crawl my site less (less bandwith google takes from my site for the same amount of crawl)or to crawl my site more often?Thanks
-
This is a complicated question that I can't give a simple answer for, as every site is set-up differently and has it's own challenges. You will likely use a variety of the techniques mentioned in my last paragraph above. Good luck.
-
Thanks Anthony,
Your explanation was very helpful.
Assuming that 3 millions pages out of my 5 are not so important for google to be crawling or indexing.
What would be the best way to optimize my crawl efficiency in relation to the amount of pages?
Just <noindex>3 million pages on the site, I believe this can be a risk move.</noindex>
Perhaps robots.txt but that would not de-index the existing pages.
-
Crawl efficiency isn't exactly the same as indexation speed. It is normal for a new page to be indexed quickly, often times it is linked to from the blog home page, shared on social networks, etc.
Crawl efficiency has a lot to do with making sure your most important pages are crawled as frequently as possible. Let's use the example of your site with 5,000,000 pages indexed. Perhaps there are 100,000 of those pages that are extremely important for your website. Your top categories, all of your products, your content, etc.
Then you are left with 4,900,000 pages that are not that important, but needed for the functionality of your website (pagination, filtering, sorting, etc). You have to determine, is it a good thing that Google has 5 million pages of your site indexed? Do you want Google regularly crawling those 4,900,000 pages, potentially at the expense of your more important pages?
Next, you check your Google Webmaster Tools and see that Google is crawling about 130,000 pages/day on your site. At that rate, it would take Google 38 days (over an entire month) to crawl your entire site. Of course, it doesn't actually work that way - Google will crawl your site in a logical manor, crawling the pages with high authority (well linked to internally/externally) much more often. The point is, you can see that not all of your pages are being crawled every day. You want your best content crawled as frequently as possible.
"To be more blunt, if a page hasn't been crawled recently, it won't rank well." This quote is taken from one of my favorite resources on this topic, is this post by AJ Kohn. http://www.blindfiveyearold.com/crawl-optimization
Crawl efficiency is guiding the search spiders to your best content and helping them learn what types of pages you can ignore. You do this primarily through: Site Structure, Internal Linking, robots.txt, NoFollow attribute and Parameter Handling in Google Webmaster Tools.
-
You can actually let Google know about a new mass of pages through the sitemap. The sitemap is a single file what can be parsed to produce a large list of links.
Google can discover new pages by comparing the list of links with what they know about.
Here's an intro link that covers the sitemap: http://blog.kissmetrics.com/get-google-to-index/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
May integrating my main category page in the index page improve my ranking of main category keyword?
90% of our sales are made with products in one of our product categories.
Intermediate & Advanced SEO | | lcourse
A search for main category keyword returns our root domain index page in google, not the category page.
I was wondering whether integrating the complete main category directly in the index page of the root domain and this way including much more relevant content for this main category keyword may have a positive impact on our google ranking for the main category keyword. Any thoughts?1 -
Google Adsbot crawling order confirmation pages?
Hi, We have had roughly 1000+ requests per 24 hours from Google-adsbot to our confirmation pages. This generates an error as the confirmation page cannot be viewed after closing or by anyone who didn't complete the order. How is google-adsbot finding pages to crawl that are not linked to anywhere on the site, in the sitemap or linked to anywhere else? Is there any harm in a google crawler receiving a higher percentage of errors - even though the pages are not supposed to be requested. Is there anything we can do to prevent the errors for the benefit of our network team and what are the possible risks of any measures we can take? This bot seems to be for evaluating the quality of landing pages used in for Adwords so why is it trying to access confirmation pages when they have not been set for any of our adverts? We included "Disallow: /confirmation" in the robots.txt but it has continued to request these pages, generating a 403 page and an error in the log files so it seems Adsbot doesn't follow robots.txt. Thanks in advance for any help, Sam
Intermediate & Advanced SEO | | seoeuroflorist0 -
How to speed indexing of web pages after website overhaul.
We have recently overhauled our website and that has meant new urls as we moved from asp to php. we also moved from http to https. The website (https://) has 694 urls submitted through site map with 679 indexed in sitemap of google search console. As we look through the google search console analytics we notice that google index section / index status it says: https://www.xyz.com version - index status 2
Intermediate & Advanced SEO | | Direct_Ram
www.xyz.com version - index status 37
xyz.com version - index status 8 how can we get more pages to be indexed or found by google sooner rather than later as we have lost major traffic. thanks for your help in advance0 -
Meta No INDEX and Robots - Optimizing Crawl Budget
Hi, Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view". So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues. They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears - So can I block these off now with robots.txt to optimize my crawl budget? Thanks
Intermediate & Advanced SEO | | bjs20100 -
No index.no follow certain pages
Hi, I want to stop Google et al from finding a some pages within my website. the url is www.mywebsite.com/call_backrequest.php?rid=14 As these pages are creating a lot of duplicate content issues. Would the easiest solution be to place a 'Nofollow/Noindex' META tag in page www.mywebsite.com/call_backrequest.php many thanks in advance
Intermediate & Advanced SEO | | wood1e19680 -
Home Page Got Indexed as httpS and Rankings Went Down.
Hello fellow SEO's About 3 weeks ago all of a sudden the home page on our Magento based website went down in rankings (from top 10 to page 3-4 Google) and was showing as httpS - instead of usual http. It first happened with just a few keywords and a week later any search phrase was returning the httpS result for the home page. When I view cache for the home page now it (both http and httpS versions) it gives me this http://clip2net.com/s/2OtPS We are not blocking anything in robots.txt Robots tags are set to index,follow There are hardly any external links pointing at the home pages as httpS This only affected the home page - all other pages rank where they used to and appear as http Has anybody ever had a similar problem? Thanks in advance for your thoughts and help
Intermediate & Advanced SEO | | ddseo0 -
Sitemap not indexing pages
My website has about 5000 pages submitted in the sitemap but only 900 being indexed. When I checked Google Webmaster Tools about a week ago 4500 pages were being indexed. Any suggestions about what happened or how to fix it? Thanks!
Intermediate & Advanced SEO | | theLotter0 -
How do I index these parameter generated pages?
Hey guys, I've got an issue with a site I'm working on. A big chunk of the content (roughly 500 pages) is delivered using parameters on a dynamically generated page. For example: www.domain.com/specs/product?=example - where "example' is the product name Currently there is no way to get to these pages unless you enter the product name into the search box and access it from there. Correct me if I'm wrong, but unless we find some other way to link to these pages they're basically invisible to search engines, right? What I'm struggling with is a method to get them indexed without doing something like creating a directory map type page of all of the links on it, which I guess wouldn't be a terrible idea as long as it was done well. I've not encountered a situation like this before. Does anyone have any recommendations?
Intermediate & Advanced SEO | | CodyWheeler0