Google indexing less url's then containded in my sitemap.xml
-
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
-
Thank you for helping
-
Unless you have a SEO actively reviewing your site, it is quite normal for Google to index less pages then are offered in your sitemap.
How exactly was your sitemap created? Did you go by hand through your site's 3281 pages and add them to a sitemap? Or more likely, did you use a tool to create the sitemap? If you used a tool, how much knowledge do you have regarding how this tool works or its settings?
Just a few examples of URLs which may be included in your sitemap that Google would likely not index:
-
Your home page and other pages may have multiple URLs which lead to the same page. For example: www.mysite.com and www.mysite.com/index.html may be two URLs for the same page. Google will likely only index one of them.
-
You may have links to various URLs which contain parameters which Google will reduce to a single URL. For example: www.mysite.com/product_id=308&sort=asc&color=black, and another URL www.mysite.com/product_id=308&sort=desc&color=black. Both URLs lead to the same content sorted differently.
-
You may have duplicate content on your site. For example, you can sell chairs and list the same chair under multiple paths such as /furniture/wood/chair123 and /furniture/dining-room/chair123. Google will recognize these two pages are the same content presented under multiple URLs.
-
You may have submitted pages to your sitemap which are blocked via robots.txt or the "noindex" tag or are canonicalized to another page.
In order to better understand the root issue you need to examine a list of all URLs in your sitemap and compare that to a list of all indexed URLs. Determine which URLs Google has not indexed and research the reason for each one independently.
-
-
Are they index worthy?
Having them on your sitemap does not mean google wants them in its index
-
He just said it. Is this a new domain? Im in the same boat as you for some of my domains.
-
Yes, I understand this. But
In this situation Google first indexes all the URL's within my sitemap.xml uploaded in Google Webmaster tools. Now Google indexes less URL's, only 50%. What can be the cause if there are no technical problems?
-
Hi!
Google will only spend 'so much time' on any new domain. The more traffic and links and page authority you get, the more time Google will dedicate to crawling your website. You should also make sure that the site is not slow, as this will reduce the crawling speed even more! See Google page speed for tips on speeding up the load time of your site
Good Luck,
Sven Witteveen
Expand Online
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Indexing product attributes in sitemap
Hey Mozzers! I'm battling a few questions about the sitemap for my ecommerce store. Could you help me out? Is it necessary to include your product attributes in the sitemap? I'm not sure why it would matter to have a sitemap that lists everything in the color cherry. Also, if the attributes were included in the sitemap, would that count as duplicate content for the same products to show up in multiple attributes? Is there any benefit to submitting the sitemaps individually? For example, submitting /product-sitemap.xml, /product_brand-sitemap.xml versus just /sitemap.xml? Any other best practices for managing my ecommerce sitemap, or great resources, would be very helpful. Thank you! a1vUz
Technical SEO | | localwork0 -
Sitemap url's not being indexed
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed) The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it. For example Url in the sitemap: http://example.com/example-category/0246 Url once you actually go to that link: http://example.com/example-category/0246#.VR5a Just for further information, the XML file does not have any style information associated with it and is in it's most basic form. Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ? Thanks all for your help.
Technical SEO | | GreenStone0 -
Page disappeared from Google index. Google cache shows page is being redirected.
My URL is: http://shop.nordstrom.com/c/converse Hi. The week before last, my top Converse page went missing from the Google index. When I "fetch as Googlebot" I am able to get the page and "submit" it to the index. I have done this several times and still cannot get the page to show up. When I look at the Google cache of the page, it comes up with a different page. http://webcache.googleusercontent.com/search?q=cache:http://shop.nordstrom.com/c/converse shows: http://shop.nordstrom.com/c/pop-in-olivia-kim Back story: As far as I know we have never redirected the Converse page to the Pop-In page. However the reverse may be true. We ran a Converse based Pop-In campaign but that used the Converse page and not the regular Pop-In page. Though the page comes back with a 200 status, it looks like Google thinks the page is being redirected. We were ranking #4 for "converse" - monthly searches = 550,000. My SEO traffic for the page has tanked since it has gone missing. Any help would be much appreciated. Stephan
Technical SEO | | shop.nordstrom0 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
Google Indexed URLs for Terms Have Changed Causing Huge SERP Drop
We haven't made any significant changes to our website, however the pages that google has indexed for our critical keywords have changed to pages that have caused our SERP to drop dramatically for those pages. In some cases, the changes make no sense at all. For example, one of our terms that used to be indexed to our homepage is now indexed to a dead category page that has nothing on it. One of our biggest terms, where we were 9th, changed and is now indexed to our FAQ. As a result, we now rank 44th. This is having a MAJOR impact on our business so any help on why this sudden change happened and what we can do to combat it is greatly appreciated.
Technical SEO | | EvergladesDirect0