Sitemap Rules
-
Hello there,
I have some questions pertaining to sitemaps that I would appreciate some guidance on.
1. Can an XML sitemap contain URLs that are blocked by robots.txt? Logically, it makes sense to me to not include pages blocked by robots.txt but would like some clarity on the matter i.e. will having pages blocked by robots.txt in a sitemap, negatively impact the benefit of a sitemap?
2. Can a XML sitemap include URLs from multiple subdomains? For example:
http://www.example.com/www-sitemap.xml would include the home page URL of two other subdomains i.e. http://blog.example.com/ & http://blog2.example.com/
Thanks
-
Theoretically, if the URL is blocked by robots.txt it should not appear in the index results no matter if they are in the sitemap but I have seen URLs indexed that are blocked by robots.txt but are in the sitemap and have good links pointing to it. If you want to block pages that have good links pointing to them, my advice is to remove them from sitemap. #justathought.
About URLs from multiple domains, I personally create separate sitemaps for different subdomains and link to main sitemap and I see better indexing that way.
Again, these are my personal experiences and not rules so please do keep that in mind as things can be different fro them.
-
Hey,
1.) Yes you can do this and it won't 'negativel impact it' but it might cause a couple of Search Console errors when you come to submit the URLs - blocking crawlers in the robots.txt file is a directive that instructs them not to crawl that particular page. With this being said, supplying them with a sitemap of all page locations will not mean that they crawl these pages, but it is an instruction to crawlers that these pages do exist. Personally, I would meta noindex these pages to make sure that they don't reach search engines as the blocking in the robots.txt file can often not be enough to prevent this, especially if you're also submitting a sitemap.
2.) In short, I don't think you can have a single XML sitemap containing URLs from multiple subdomains BUT you can have sitemaps for multiple subdomains hosted on the TLD individually. Google have broken this down really well in their Webmaster Tools post:
https://support.google.com/webmasters/answer/75712?hl=en&topic=8476&ctx=topic
Hope this helps!
Sean
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks!
Technical SEO | | jgresalfi0 -
Can you keep you old HTTP xml sitemape when moving to HTTPS site wide?
Hi Mozers, I want to keep the HTTP xml sitemape live on my http site to keep track of indexation during the HTTPS migration. I'm not sure if this is doable since once our tech. team forces the redirects every http page will become https. Any ideas? Thanks
Technical SEO | | znotes0 -
Will it be possible to point diff sitemap to same robots.txt file.
Will it be possible to point diff sitemap to same robots.txt file.
Technical SEO | | nlogix
Please advice.0 -
Sitemap and crawl impact
If I have two links in the sitemap (for example: page1.html and page2.html) but the web-site contains more pages (page1.html, page2.html and page3.html) is this a sign for Google to not to crawl other pages? I.e. Will Google index page3.html? Consider that any page can be accessed.
Technical SEO | | ditoroin0 -
How do you handle Wordpress sitemaps within your site?
I have a regular site map on my site and I also have a Wordpress site installed within it that we use for blog/news content. I currently have an auto-sitemap generator installed in Wordpress which automatically updates the sitemap and submits it to the search engines each time the blog is updated. The question I have (which I think I know the answer to but I just want to confirm) is do I have to include all of the articles within the blog in the main site's sitemap despite the Wordpress sitemap having them in there already? If I do include the articles in the main website's sitemap, they would also be in the Wordpress sitemap as well, which is redundant. Redundancy is not good, so I just want to make sure.
Technical SEO | | iresqkeith0 -
Partial mobile sitemap
Hi, We have a main www website with a standard sitemap. We also have a m. site for mobile content (but m. is only for our top pages and doesn't include the entire site). If a mobile client accesses one of our www pages we redirect to the m. page. If we don't have a m. version we keep them on the www site. Currently we block robots from the mobile site. Since our m. site only contains the top pages, I'm trying to determine the boost we might get from creating a mobile sitemap. I don't want to create the "partial" mobile sitemap and somehow have it hurt our traffic. Here is my plan update m. pages to point rel canonical to appropriate www page (makes sure we don't dilute SEO across m. and www.) create mobile sitemap and allow all robots to access site. Our www pages already rank fairly highly so just want to verify if there are any concerns since m. is not a complete version of www?
Technical SEO | | NicB10 -
Adjust the priority field under the XML sitemap option
For those familiar with this in Drupal - is this worth doing? It seems to be a setting that affects the priority of a URL compared to others on the site. It's set to a default of 0.5 but you can increase up to 1.0 I think. Anyone know about this? thanks
Technical SEO | | inhouseninja0