Some URLs in the sitemap not indexed
-
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index.
It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible.
Any ideas on why this is? Is this type of discrepancy typical?
-
Thanks. Very helpful!
-
This is great to know that 10% is a good discrepancy. Hard to know otherwise.
That article about Screaming Frog is super helpful, thanks!
-
I have never had a site with 100% crawled pages, sometimes Google will drop a page off for being too similar to another, not informative enough, canonical links set, redirects.
As Ryan says, don't just rely on Moz use Screaming Frog to get a good view of your site too, see if there are any errors. Also you can run the frog whenever you like, it's just a little more technical to understand.
Xenu oooh never heard of that one Ryan thanks!
Just looked into Xenu, Screaming frog does it all and some.
-
Hi Mase,
I've managed sites with with hundreds of thousands of pages too, and in my experience a discrepancy between what's offered up via the sitemaps and what gets indexed is typical (dare I say it, a 10% discrepancy seems pretty good!). Pages deeper in the site seem to suffer this fate more frequently than those with fewer subfolders, as do those with thin content.
I agree completely with Ryan's comment about Screaming Frog: it is an invaluable tool for site audits, in addition to lots of other useful site insights. You might find this article interesting to get a sense of the many ways you can use SF: http://www.seerinteractive.com/blog/screaming-frog-guide/
-
You're welcome. Definitely take a look at a crawler that gives you more insight, especially with a site as large as yours. Just note, no matter what you might never achieve an exact match between the pages you've submitted and the number indexed as Google can decide not to index a page for other reasons aside from the page's presence in a site map. Something useful for you as well would be to look at how many of your pages recieve visits in analytics. That will give you an idea of percentages on pages in the sitemap vs the index vs active.
-
I have not run the site through those tools you mentioned, I'm unfamiliar.
I am not, however, receiving any errors on those pages. And when I "Fetch and Render" in GWMT, they look and render fine without errors. I'm able to submit them to the index one-by-one.
Thanks for your response, Ryan.
-
Hi Mase. Are you getting errors on URLs you've submitted? Or ran other crawlers on your site like Xenu or ScreamingFrog to produce any possible errors? It's also good to know which pages might not have enough content to be indexed: filters, sorting views, etc.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing URLs for SEO
Hi, Currently we have a page, /business, but we have shifted our strategy to optimize for this page for the keyword "enterprise" instead of "business". The page authority of this page is 18 and our domain authority is 35. I've already updated content and title tags to more of an enterprise focus. Would it be wise to move the page to /enterprise and create a 301 redirect from /business to /enterprise? Or is this too risky from an SEO standpoint? Thanks!
Technical SEO | | mikekeeper0 -
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
Is there a maximum sitemap size?
Hi all, Over the last month we've included all images, videos, etc. into our sitemap and now its loading time is rather high. (http://www.troteclaser.com/sitemap.xml) Is there any maximum sitemap size that is recommended from Google?
Technical SEO | | Troteclaser0 -
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating. Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site. So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure? We are signed up with WMT if that helps.
Technical SEO | | kirmeliux0 -
Friendly URLS (SEO urls)
Hello, I own a eCommerce site with more than 5k of products, urls of products are : www.site.com/index.php?route=product/product&path=61_87&product_id=266 Im thinking about make it friend to seo site.com/category/product-brand Here is my question,will I lost ranks for make that change? Its very important to me know it Thank you very much!
Technical SEO | | matiw0 -
How can I see the SEO of a URL? I need to know the progress of a specific landing-page of my web. Not a keyword, an url please. Thanks.
I need to know the evolution on SEO of a specific landing-page (an URL) of my web. Not a keyword, a url. Thanks. (Necesito saber si es posible averiguar el progreso de una URL específica en el posicionamiento de Google. Es decir, lo que hace SEOmoz con las palabras clave pero al revés. Yo tengo una url concreta que quiero posicionar en las primeras posiciones de Google pero quiero ver cómo va progresando en función a los cambios que le voy aplicando. Muchas gracias)
Technical SEO | | online_admiral0 -
URL Rewrite
We are trying to convince a client to do a massive rewrite from all URL's looking like this: "www.company.com/category/categoryId=82374" to something like "www.company.com/womens/jackets/rain" How would you describe the importance and impact of doing URL rewrites to an ecommerce site? What evidence/research can we share with them to convince them it is worth the time and effort to do?
Technical SEO | | Hakkasan0 -
Sitemaps - Format Issue
Hi, I have a little issue with a client site whose programmer seems kind of unwilling to change things that he has been doing a long time. So, he has had this dynamic site set up for a few years and active in google webmaster tools and others, but is not happy with the traffic it is getting. When I looked at webmaster tools I see that he has a sitemap registered, but it is /sitemap.php When I said that we should be offering the SE's /sitemap.xml his response is that sitemap.php checks the site every day and generates /sitemap.xml, but there is no /sitemap.xml registered in webmaster tools. My gut is telling me that he should just register /sitemap.xml in webmaster tools, but it is a hard sell 🙂 Anyone have any definitive experience of people doing this before and whether it is an issue? My feeling is that it doesn't need to be rocket science... Any input appreciated, Sha
Technical SEO | | ShaMenz0