Sitemap issues
-
Hi ALL
Okay I'm a bit confused here, but it says I have submitted 72 (pages) im assuming and its returning only (2 pages) have been indexed?
I submitted a new site map for each of my 3 top level domains and checked it today and its showing this result attached.
We are still having issues with meta tags showing up in the incorrect country.
If anyone knows how I can attend to this knightmare would be much appreciated lol
-
Awesome response Dirk! Thanks again for your endless help!
-
Hey again!
...I can't believe I didn't think of the simplicity of this earlier...
"it's even faster if you sort the url's in alphabetical order & delete the rows containing priority / lastmod /.. - then you only need to do a find/replace on the <loc>/</loc> "
100,000 spreadsheets and I don't even think of sorting for this task. Unreal. I laughed at myself aloud when I read it.
Thank you per usual my friend!
-
Hi Patrick,
Your method is a good one - I use more or less the same trick to retrieve url's from a sitemap in xls (it's even faster if you sort the url's in alphabetical order & delete the rows containing priority / lastmod /.. - then you only need to do a find/replace on the <loc>/</loc> )
It's just in this specific case as the sitemap was generated in Screaming Frog that it's easier to eliminate these redirected url's upfront.
Dirk
-
Thanks so much Dirk - this is great. I was speaking to how I found the specific errors. Thanks for posting this for the sitemap - definitely left a big chunk out on my part!
-
Hi Justin
The how-to of Patrick is correct - but as you are generating your sitemap using Screaming Frog there is really no need to go through this manual processing.
If you only need to create the sitemap:
Go to Configuration > Spider -
Tab: Basic settings: uncheck everything apart from "Crawl Canonicals" (unchecking Images/CSS/JS/External links is not strictly necessary but speeds up the crawl)Advanced: Check "Always Follow redirects" / "Respect Noindex" / "Respect Canonical"
After the crawl - generate the sitemap - it will now only contain the "final" url's - after the redirects.
Hope this helps,
Dirk
PS Try to avoid internal links which are redirected - better to replace these links by links to the final destination
-
Hi Justin
Probably the easiest way to eliminate these cross references is to ask your programmer to put all links as relative links rather than as absolute links. Relative links have the disadvantage that they can generate endless loops if something is wrong with the HTML - but this is something you can easily check with Screaming Frog.
If you check the .com version - example https://www.zenory.com/blog/tag/love/ -it's calling zenory.co.nz for plenty of links (just check the source & search for .co.nz) - both the http & the https version
You can check all these pages by hand - but I guess your programmer must be able to do this in an automated way.
It is also the case the other way round- on the .co.nz version - you'll find references in the source to the .com version
In screaming frog - the links with "NZ" are the only ones which should stay absolute - as they point to the other version
Hope this clarifies
Dirk
-
Wow thanks Patrick, let me run this and see how I go, thanks so much for your help!!!
-
Dirk, thanks so much for your help!
Could you tell me how to identify with the urls that are cross referencing - I tried using screaming frog and I found under the **external and clicked on inlinks and outlinks. **But whats really caught my eye, is alot of the links are from the blog with the same anchor text "name" others are showing up as a different name as well. Some are saying NZ NZ or AU AU as the anchor text and I think this has to do with the flag drop down to change the top level domains.
For eg:
FROM: https://www.zenory.co.nz/blog/tag/love/
TO: https://www.zenory.com.au/categories/love-relationships
Anchor Text: Twinflame Reading
-
Hi Justin
Yep! I use ScreamingFrog, here's how I do it:
-
Goto your /sitemap.xml
-
Select all + copy
-
Paste into Excel column A
-
Select column A
-
Turn "Wrap Text" off
-
Delete rows 1 through 5
-
Select column A again
-
"Find and Replace" the following:
-
<lastmod></lastmod>
-
<changefreq></changefreq>
-
daily
-
Whatever the date is
-
Priority numbers, usually 0.5 to 1.0
-
"Replace With" nothing, no spaces, nothing
-
You'll hit "Replace All" after every text string you put in, one at a time
-
With Column A still select, hit F5
-
Click "Special"
-
Click "Blank" and "Ok"
-
Right click in the spreadsheet
-
Select "Delete" and "Shift Rows Up"
Walla! You have your list. Now copy this list, and open ScreamingFrog. Click "Mode" up top and click "List". Click "Upload List" and click "Paste". Paste your URLs in there and hit Start.
Your sitemap will be crawled.
Here are URLs that returned 301 redirects:
https://www.zenory.com/blog/chat-psychic-readings/
https://www.zenory.com/blog/online-psychic-readings-private/
https://www.zenory.com/blog/live-psychic-readings/
https://www.zenory.com/blog/online-psychic-readings/Here are URLs that returned 503 Service Unavailable codes twice, but 200s now:
https://www.zenory.com/blog/spiritually-love/
https://www.zenory.com/blog/automatic-writing-psychic-readings/
https://www.zenory.com/blog/author/psychic-nori/
https://www.zenory.com/blog/zodiac-signs/
https://www.zenory.com/blog/soul-mate-relationship-challenges/
https://www.zenory.com/blog/author/zenoryadmin/
https://www.zenory.com/blog/soulmate-separation-break-ups/
https://www.zenory.com/blog/how-to-find-a-genuine-psychic/
https://www.zenory.com/blog/mind-body-soul/
https://www.zenory.com/blog/twin-flame-norishing-your-flame-to-find-its-twin/
https://www.zenory.com/blog/tips-psychic-reading/
https://www.zenory.com/blog/tips-to-dealing-with-a-broken-heart/
https://www.zenory.com/blog/the-difference-between-soul-mates-and-twin-flames/
https://www.zenory.com/blog/sex-love/
https://www.zenory.com/blog/psychic-advice-break-ups/
https://www.zenory.com/blog/author/ginny/
https://www.zenory.com/blog/chanelling-psychic-readings/
https://www.zenory.com/blog/first-release-cycle-2015/
https://www.zenory.com/blog/psychic-shaman-readings/
https://www.zenory.com/blog/chat-psychic-readings/
https://www.zenory.com/blog/psychic-medium-psychic-readings/
https://www.zenory.com/blog/author/trinity/
https://www.zenory.com/blog/psychic-readings-karmic-relationships/
https://www.zenory.com/blog/can-psychic-readings-heal-broken-heart/
https://www.zenory.com/blog/guidance-psychic-readings/
https://www.zenory.com/blog/mercury-retrograde-effects-life/
https://www.zenory.com/blog/online-psychic-readings-private/
https://www.zenory.com/blog/psychics-mind-readers/
https://www.zenory.com/blog/angel-card-readings-psychic-readings/
https://www.zenory.com/blog/cheating-relationship/
https://www.zenory.com/blog/long-distance-relationship/
https://www.zenory.com/blog/soulmate-psychic-reading/
https://www.zenory.com/blog/live-psychic-readings/
https://www.zenory.com/blog/psychic-readings-using-rune-stones/
https://www.zenory.com/blog/psychic-clairvoyant-psychic-readings/
https://www.zenory.com/blog/psychic-guidance-long-distance-relationships/
https://www.zenory.com/blog/author/libby/
https://www.zenory.com/blog/online-psychic-readings/I would check on that when you can. Check in Webmaster Tools if any issues have arrived there as well.
Hope this helps! Good luck!
-
-
Thanks so much Patrick! Can you recommend how I would go about finding the urls that are redirecting in the sitemap? I'm assuming screaming frog?
-
Hi Justin
Google doesn't seem to be figuring out (even with the correct hreflang in place) which site should be shown for each country.
If you look at the cached versions of your .com.au & .com versions it always the .co.nz version which is cached - this is probably also the reason why the meta description is wrong (it's always coming from the .co.nz version) and why the % of url's indexed for each sitemap (for the .com & .com.au version) is so low.
Try to rigorously eliminate all cross-references in your site - to make it more obvious for Google that these are 3 different sites:
-
in the footer - the links in the second column are pointing to the .co.nz version (latest articles) - change these links to relative ones
-
on all sites there are elements you load from the .com domain (see latest blog entries - the images are loaded from the .com domain for all tld's
As long as you send these confusing signals to Google - Google will mix up the different versions of your site.
rgds,
Dirk
-
-
Hi there Justin
Everything looks fine from here - there are a couple URLs that need to be updated in your sitemap as they are redirecting.
Google takes time to index, so give this a little more time. You could ask Google to recrawl your URLs but that's very unnecessary at the moment; just something to note.
I would make sure your internal links are all good to go and "follow" so that crawlers can at least find URLs that way.
I did a quick site: search on Google, so far you have 58 pages indexed. You should be okay.
Hope this helps! Good luck!
-
Hi Justin,
Similar question asked in this post @ http://moz.com/community/q/webmaster-tools-indexed-pages-vs-sitemap
Hope this helps you.
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Back links issue and how to resolve it
Hi there! We have a client who has been generating back links from external sites over a period of two years with all the same anchor text which all link back to the home page. This anchor text is also their main search phrase they wish to score highly on. In total, they have roughly 300 domain names linking to their site. Over 50 of these domain names all have the same anchor text. These links have been generated through articles and blogs. So roughly 20% of the total number of links all have the same anchor text. Over the past 6 months the client has noticed a steady drop in their rankings for this term. From the back link analysis we have done, we believe it is this which is causing the problem. Does any one else agree? For the remedy, do we go in and see if we can change the anchor text or disavow them through Google webmaster tools? Suggestions? Thanks for your help! P 🙂
White Hat / Black Hat SEO | | Globalgraphics0 -
Robots.txt file in Shopify - Collection and Product Page Crawling Issue
Hi, I am working on one big eCommerce store which have more then 1000 Product. we just moved platform WP to Shopify getting noindex issue. when i check robots.txt i found below code which is very confusing for me. **I am not getting meaning of below tags.** Disallow: /collections/+ Disallow: /collections/%2B Disallow: /collections/%2b Disallow: /blogs/+ Disallow: /blogs/%2B Disallow: /blogs/%2b I can understand that my robots.txt disallows SEs to crawling and indexing my all product pages. ( collection/*+* ) Is this the query which is affecting the indexing product pages? Please explain me how this robots.txt work in shopify and once my page crawl and index by google.com then what is use of Disallow: Thanks.
White Hat / Black Hat SEO | | HuptechWebseo0 -
How many links can you have on sitemap.html
we have a lot of pages that we want to create crawlable paths to. How many links are able to be crawled on 1 page for sitemap.html
White Hat / Black Hat SEO | | imjonny0 -
Moz was unable to crawl your site? Redirect Loop issue
Moz was unable to crawl your site on Jul 25, 2017. I am getting this message for my site: It says "unable to access your homepage due to a redirect loop. https://kuzyklaw.com/ Site is working fine and last crawled on 22nd July. I am not sure why this issue is coming. When I checked the website in Chrome extension it saysThe server has previously indicated this domain should always be accessed via HTTPS (HSTS Protocol). Chrome has cached this internally, and did not connect to any server for this redirect. Chrome reports this redirect as a "307 Internal Redirect" however this probably would have been a "301 Permanent redirect" originally. You can verify this by clearing your browser cache and visiting the original URL again. Not sure if this is actual issue, This is migrated on Https just 5 days ago so may be it will resolved automatically. Not sure, can anybody from Moz team help me with this?
White Hat / Black Hat SEO | | CustomCreatives0 -
Sitemap issues 19 warnings
Hi Guys I seem to be having a lot of sitemap issues. 1. I have 3 top level domains, all with the co.nz sitemap that was submitted 2. I'm in the midst of a site re-design so I'm unsure if I should be updating these now or when the new site goes live (in two weeks) 3. I have 19 warnings from GWT for the co.nz site and they gave me 3 examples looks like 404 errors however I'm not too sure and a bit green on my behalf to find out where the issues are and how to fix them. (it is also showing that 95 pages submitted and only 53 were indexed) 4. I generated recently 2 sitemaps for .com and com.au submitted these both to google and when i create i still see the co.nz sitemap Would love some guidance around this. Thanks
White Hat / Black Hat SEO | | edward-may0 -
Image Optimization & Duplicate Content Issues
Hello Everyone, I have a new site that we're building which will incorporate some product thumbnail images cut and pasted from other sites and I would like some advice on how to properly manage those images on our site. Here's one sample scenario from the new website: We're building furniture and the client has the option of selecting 50 plastic laminate finish options from the Formica company. We'll cut and paste those 50 thumbnails of the various plastic laminate finishes and incorporate them into our site. Rather than sending our website visitors over to the Formica site, we want them to stay put on our site, and select the finishes from our pages. The borrowed thumbnail images will not represent the majority of the site's content and we have plenty of our own images and original content. As it does not make sense for us to order 50 samples from Formica & photograph them ourselves, what is the best way to handle to issue? Thanks in advance, Scott
White Hat / Black Hat SEO | | ccbamatx0 -
Does Lazy Loading Create Indexing Issues of products?
I have store with 5000+ products in one category & i m using Lazy Loading . Does this effects in indexing these 5000 products. as google says they index or read max 1000 links on one page.
White Hat / Black Hat SEO | | innovatebizz0 -
Does IP Blacklist cause SEO issues?
Hi, Our IP was recently blacklisted - we had a malicious script sending out bulk mail in a Joomla installation. Does it hurt our SEO if we have a domain hosted on that IP? Any solid evidence? Thanks.
White Hat / Black Hat SEO | | bjs20100