Getting Google to index our sitemap
-
Hi,
We have a sitemap on AWS that is retrievable via a url that looks like ours http://sitemap.shipindex.org/sitemap.xml. We have notified Google it exists and it found our 700k urls (we are a database of ship citations with unique urls). However, it will not index them. It has been weeks and nothing. The weird part is that it did do some of them before, it said so, about 26k. Then it said 0. Now that I have redone the sitemap, I can't get google to look at it and I have no idea why. This is really important to us, as we want not just general keywords to find our front page, but we also want specific ship names to show links to us in results. Does anyone have any clues as to how to get Google's attention and index our sitemap? Or even just crawl more of our site? It has done 35k pages crawling, but stopped.
-
Now I can see Sitemaps, loadings takes time ... a lot and they look weird, but maybe ok. But there is stuff in it, wich I wont like to have in Google-Index. Northeless - whats the message in GSC?
(opend in Chrome, Firefox and on my pixel as well - the first one is looking good, all linked once had the error, now they are differnet from each other (with Linebreaks or without, with space or without) but contain links at least)
Is the site on a subdomain for pages on a different domain? (didn't saw that) - that makes it way more tricky ...
-
I redid the sitemap and just made them xml, Andreas. It hasn't seemed to help. Still not getting indexed. I don't know where you were seeing that information in the sitemap files. Can you tell me how you opened them to see that? All I see is the normal content.
Shawn
-
Maybe I need to change them to plain xml files and update the index file?
-
Where are you seeing the error? I am opening them and see all the content required. I am confused. I don't think I have a key field in the sitemaps.
-
Hi,
I wrote a post about Google & Sitemaps think two month ago, (https://intenseo.de/seo-blog/google/google-sitemaps/) unfortunately in german. So I guess I have to translate the stuff:
- A Sitemap should have not more than 50,000 entrys (Google-News-Sitemaps only 1,000)
- and should not be bigger than 50MB
So you have to split it and you allready did.
Now your Main-Sitemap is pointing to other Sitemaps (zipped, but thats not a problem), ok. So whenever GSC is telling me, my Sitemap has errors or no entrys, I open it and check. I did, I just opened the first one, look what is in it:
NoSuchKey
<message>The specified key does not exist.</message><key>sitemaps/sitemap1.xml</key><requestid>97FFA90B9843EBCA</requestid>vBzVH8Lx9fLYpPgv5SKfSzlKb4lcGxX4+V9JBO4f/M7HiDXQJT/hoLd9b/IYWanl06M41M4oCN8=I opened all of your Sitemaps, no entries in it.At least, Google indexed >8,000 Pages, but not by Sitemap thats for sure.
You can just create sitemaps with Tools (link at the bottom) or with e.g. screaming frog and upload them to your server (zipped or not doesn't matter) sent to google and done. If your System is not working at the moment, easy workaround for short.After that - try to find the Bug in creating your sitemaps, solve it and sent these to Google. Before you sent them, open your sitemaps and check if they are working. Don't wait weeks, Google is fast.List of Sitemap Generators: https://code.google.com/archive/p/sitemap-generators/wikis/SitemapGenerators.wiki
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Only 285 of 2,266 Images Indexed by Google
Only 285 of 2,266 Images Indexed by Google. Images for our site are hosted on Amazons CDN cloud based hosting service. Our Wordpress site is on a virtual private server and has its' own IP address. The number of indexed images has dropped substantially in the last year. Our site is for a real estate brokerage firm. There are about 250 listing pages set to "no-index". Perhaps these contain 400 photos, so they do not account for why so few photos have been indexed. The concern is that the low number of indexed images could be affecting overall ranking. The site URL is www.nyc-officespace-leader.com. Is this issue something that we should be concerned about? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
Google Is Indexing My Internal Search Results - What should i do?
Hello, We are using a CMS/E-Commerce platform which isn't really built with SEO in mind, this has led us to the following problem.... a large number of internal (product search) search result pages, which aren't "search engine friendly" or "user friendly", are being indexed by google and are driving traffic to the site, generating our client revenue. We want to remove these pages and stop them from being indexed, replacing them with static category pages - essentially moving the traffic from the search results to static pages. We feel this is necessary as our current situation is a short-term (accidental) win and later down the line as more pages become indexed we don't want to incur a penalty . We're hesitant to do a blanket de-indexation of all ?search results pages because we would lose revenue and traffic in the short term, while trying to improve the rankings of our optimised static pages. The idea is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages. Our main focus is to improve user experience and not have customers enter the site through unexpected pages. All thoughts or recommendations are welcome. Thanks
Intermediate & Advanced SEO | | iThinkMedia0 -
Why Is Google Indexing These Product Pages On Shopify?
How can we communicate to Google the exact product pages we'd like indexed on our site? We're an apparel company that uses Shopify as our ecommerce platform. Website is sportiqe.com. Currently, Google is indexing all types of different pages on our site. **Example of a product page we want indexed: ** Product Page: sportiqe.com/products/PRODUCT-TITLE (Like This) **Examples of product pages being indexed: ** sportiqe.myshopify.com/products/PRODUCT-TITLE sportiqe.com/collections/COLLECTION-NAME/products/PRODUCT-TITLE See attached for an example of how two different "Boston Celtics Grateful Dead" shirts are being indexed. Any suggestions? We've used both Shopify and Google Webmaster tools to set our preferred domain (sportiqe.com). We've also added this snippet of code to our site three months ago thinking that would do the trick... {% if template == 'product' %}{% if collection %} {% endif %}{% endif %} sKwNZOl
Intermediate & Advanced SEO | | farmiloe0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Removing a Page From Google index
We accidentally generated some pages on our site that ended up getting indexed by google. We have corrected the issue on the site and we 404 all of those pages. Should we manually delete the extra pages from Google's index or should we just let Google figure out that they are 404'd? What the best practice here?
Intermediate & Advanced SEO | | dbuckles0 -
Most Painless way of getting Duff Pages out of SE's Index
Hi, I've had a few issues that have been caused by our developers on our website. Basically we have a pretty complex method of automatically generating URL's and web pages on our website, and they have stuffed up the URL's at some point and managed to get 10's of thousands of duff URL's and pages indexed by the search engines. I've now got to get these pages out of the SE's indexes as painlessly as possible as I think they are causing a Panda penalty. All these URL's have an addition directory level in them called "home" which should not be there, so I have: www.mysite.com/home/page123 instead of the correct URL www.mysite.com/page123 All these are totally duff URL's with no links going to them, so I'm gaining nothing by 301 redirects, so I was wondering if there was a more painless less risky way of getting them all out the indexes (IE after the stuff up by our developers in the first place I'm wary of letting them loose on 301 redirects incase they cause another issue!) Thanks
Intermediate & Advanced SEO | | James770 -
Export list of urls in google's index?
Is there a way to export an exact list of urls found in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0