XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
I found page duplicate content when using Moz crawl tool, see below. http://www.example.com
Intermediate & Advanced SEO | | gozmoz
Page Authority 40
Linking Root Domains 31
External Link Count 138
Internal Link Count 18
Status Code 200
1 duplicate http://www.example.com/index.htm
Page Authority 19
Linking Root Domains 1
External Link Count 0
Internal Link Count 15
Status Code 200
1 duplicate I have recently transfered my old html site to wordpress.
To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
Redirect 301 /index.htm http://www.example.com/index.php
or
Redirect 301 /index.htm http://www.example.com Question 2
Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0 -
How do we decide which pages to index/de-index? Help for a 250k page site
At Siftery (siftery.com) we have about 250k pages, most of them reflected in our sitemap. Though after submitting a sitemap we started seeing an increase in the number of pages Google indexed, in the past few weeks progress has slowed to a crawl at about 80k pages, and in fact has been coming down very marginally. Due to the nature of the site, a lot of the pages on the site likely look very similar to search engines. We've also broken down our sitemap into an index, so we know that most of the indexation problems are coming from a particular type of page (company profiles). Given these facts below, what do you recommend we do? Should we de-index all of the pages that are not being picked up by the Google index (and are therefore likely seen as low quality)? There seems to be a school of thought that de-indexing "thin" pages improves the ranking potential of the indexed pages. We have plans for enriching and differentiating the pages that are being picked up as thin (Moz itself picks them up as 'duplicate' pages even though they're not. Thanks for sharing your thoughts and experiences!
Intermediate & Advanced SEO | | ggiaco-siftery0 -
Is this a good sitemap hierarchy for a big eCommerce site (50k+ pages).
Hi guys, hope you're all good. I am currently in the process of designing a new sitemap hierarchy to ensure that every page on the site gets indexed and is accessible via Google. It's important that our sitemap file is well structured, divided and organised into relevant sub-categories to improve indexing. I just wanted to make sure that it's all good before forwarding onto the development team for them to consider. At the moment the site has everything thrown into /sitemap.xml/ and it exceeds the 50k limit. Here is what I have came up with: A primary sitemap.xml referencing other sitemap files, each of the following areas will have their own sitemap of which is referenced by /sitemap.xml/. As an example, sitemap.xml will contain 6 links, all of which link to other sitemaps. Product pages; Blog posts; Categories and sub categories; Forum posts, pages etc; TV specific pages (we have a TV show); Other pages. Is this format correct? Once it has been implemented I can then go ahead and submit all 6 separate sitemaps to webmaster tools + add a sitemap link to the footer of the site. All comments are greatly appreciated - if you know of a site which has a good sitemap architecture, please send the link my way! Brett
Intermediate & Advanced SEO | | Brett-S0 -
HTML or XML sitemap - benefits
Hi all, Can I use only HTML sitemap or I should use both versions?
Intermediate & Advanced SEO | | Tormar
How much I would lose in case when I would lose only HTML sitemap, without XML sitemap? Thank you.0 -
Should I bother with a Video Sitemap?
Morning all, I've started a pretty aggressive Video content push in recent weeks. All our videos are on our YouTube channel. I decided to go with hosting the videos on YouTube based on my research on moz.com, especially considering the potential reach of the content on YouTube. What I'm finding is that the YouTube channel is doing great. We've hit 200 subscribers and 15K views in a little under a month. Wayyyy more than I could have ever hoped for. But the blog posts on our website are getting minimal traffic and no search visibility. That doesn't necessarily bother me, since the intention of our marketing campaign is to use YouTube to drive traffic to our website. So I guess my question is really more to do with optimizing the site with Video Sitemaps and best practices for Google Webmaster Tools. Right now we have YouTube videos embedded on blog posts like this one that have a time-stamp. But I've been working to create Gallery-style pages (no time-stamp) which would have multiple YouTube videos embedded on them like this one. These make it easier for visitors to watch multiple videos without needing to skip around to multiple blog posts. The challenge I'm running into is that when I go to submit a Video Sitemap to GWT I get an error saying that I have duplicate page content within the video sitemap. I've used several WP plugins to do this. It seems that when there is a video embedded on multiple URLs (pages + posts) the plugins will ignore the posts and only add the pages to the video sitemap. Here is my regular Sitemap Here is my video Sitemap I've attached a screenshot of my current Yoast Video SEO config if that's useful for reference. Does anyone have experience with using multiple sitemaps in GWT? I'm starting to think that maybe I shouldn't even bother with a video sitemap. Maybe those gallery-style pages should just go in the regular sitemap? Any thoughts or advice would be highly appreciated! Thanks llQfydA
Intermediate & Advanced SEO | | TMHoward860 -
Content not indexed
How come i google content that resides on my website and on my homepage and my site doesn't come up? I know the content is unique i wrote that. I have a feeling i have some kind of a crawling issue but cannot determine what it is. I ran the crawling test and other tools and didn't find anything. Google shows me that pages are indexed but yet its weird try googling snippets of content and you'll see my site isnt anywhere. Have you experienced that before? First i thought it was penalized but i submitted the reconsideration request and it came back clear, No manual spam action found. And i did not get any message in my GWMT either. Any thoughts?
Intermediate & Advanced SEO | | CMTM0 -
Indexing non-indexed content and Google crawlers
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website. I've added a 'noindex' meta tag to articles that shouldn't be live until a future date. When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page? Thanks. 🙂
Intermediate & Advanced SEO | | Alex-Harford0 -
Can you see the 'indexing rules' that are in place for your own site?
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed. If you can see them - how?
Intermediate & Advanced SEO | | Visually0