XML Sitemaps

Sitemaps are an important part of website optimization as they provide search engines an avenue for discovering pages on a site. It isn't always possible to internally link every page on a site, especially when dealing with a large website, however, with sitemaps, you can ensure that Google is able to discover important pages, even those that have been orphaned.

One other important advantage of sitemaps is that they allow search engines to download pages quickly, especially when pages change. In addition to these and other benefits of using sitemaps, they provide valuable metadata associated with each page listed in a sitemap file. This metadata can be used to tell Google and other search engines about the specific types of content on your pages, for example videos, images, or articles, and the various attributes associated with these content types.

This guide will not only cover the various sitemap formats you can leverage for better discoverability, but also cover the various sitemap extensions which you can use for different content types on your site.

Getting started

Below are a few steps to help you get started with sitemaps:

  1. Before you start building sitemaps decide which pages should be crawled by Google. You want to avoid including broken pages, pages which have been deprecated or you otherwise don't care about, and any pages that redirect to other locations. Only include canonical version of each page you include in your sitemaps.

  2. Decide on the sitemap format you're going to use, XML, RSS, Text, etc.. We'll cover the various sitemap formats in the sections below.

  3. Once you decide on the appropriate format for your sitemaps you can start building them. You can create your sitemaps manually or choose from a number of different tools and plugins. Here is a list of tools you can use (https://code.google.com/archive/p/sitemap-generators/wikis/SitemapGenerators.wiki)

  4. After you've built your sitemap you can test it via this stool Search Console Sitemaps testing tool. Correct any errors you see before submitting the sitemap to Google.

  5. As a final step tell Google about your sitemap(s) by

    1. Submitting it to Google directly through Search Console

    2. Adding the sitemap URL path to your robots.txt file

Sitemap formats

Google supports several different sitemap formats including XML, RSS, Atom, and even simple Text files. The important difference between XML sitemaps and RSS/Atom feeds is that XML sitemaps describe an entire set of URLs within a site or section, whereas RSS/Atom feeds describe only the most recent changes. While XML sitemaps give Google information about all the pages on your site, RSS/Atom feeds provide Google only the most recent updates helping keep your content fresh in their index. It's also important to mention that regardless of format, all sitemaps should be limited to 50,000 URLs or 50MB (uncompressed) per sitemap file. If you have a larger file or more URLs, you'll need to break them up into multiple sitemap files.

XML

Below is a very basic example of an XML sitemap that includes the location of a single URL:

          
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="">
<url>
<loc><a href="http://www.example.com/foo.html" class="redactor-autoparser-object">http://www.example.com/foo.htm...;
</url>
</urlset>
        

Here is a more complex sitemap that includes a single URL, as well as image and video file information for resources on that page:

          
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
  xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url>
    <loc>http://www.example.com/foo.html</loc>
    <image:image>
       <image:loc>http://example.com/image.jpg</image:loc>
       <image:caption>Dogs playing poker</image:caption>
    </image:image>
    <video:video>
      <video:content_loc>
        http://www.example.com/video123.flv
      </video:content_loc>
      <video:player_loc allow_embed="yes" autoplay="ap=1">
        http://www.example.com/videoplayer.swf?video=123
      </video:player_loc>
      <video:thumbnail_loc>
        http://www.example.com/thumbs/123.jpg
      </video:thumbnail_loc>
      <video:title>Grilling steaks for summer</video:title>  
      <video:description>
        Cook the perfect steak every time.
      </video:description>
    </video:video>
  </url>
</urlset>
        

Google also supports specific XML extensions for videos, images, and news. Below we'll describe each extension in detail.

Videos XML sitemap

Video sitemaps are an excellent way to make sure that Google can discover all the video content on your site. Improving the findability of your video content can improve your site's appearance in Google Video Search results. The video extension of the sitemap protocol enables you to give Google descriptive information about your videos, such as a video title, description, duration, etc.. Below is an example of a typical video XML sitemap.

          
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
   <url>
     <loc>http://www.example.com/videos/some_video_landing_page.html</loc>
     <video:video>
       <video:thumbnail_loc>http://www.example.com/thumbs/123.jpg</video:thumbnail_loc>
       <video:title>Grilling steaks for summer</video:title>
       <video:description>Alkis shows you how to get perfectly done steaks every
         time</video:description>
       <video:content_loc>http://www.example.com/video123.mp4</video:content_loc>
       <video:player_loc autoplay="ap=1">
         http://www.example.com/videoplayer.mp4?video=123</video:player_loc>
       <video:duration>600</video:duration>
       <video:expiration_date>2009-11-05T19:20:30+08:00</video:expiration_date>
       <video:rating>4.2</video:rating>
       <video:view_count>12345</video:view_count>
       <video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date>
       <video:family_friendly>yes</video:family_friendly>
       <video:restriction relationship="allow">IE GB US CA</video:restriction>
       <video:gallery_loc title="Cooking Videos">http://cooking.example.com</video:gallery_loc>
       <video:price currency="EUR">1.99</video:price>
       <video:requires_subscription>yes</video:requires_subscription>
       <video:uploader info="http://www.example.com/users/grillymcgrillerson">GrillyMcGrillerson
         </video:uploader>
       <video:live>no</video:live>
     </video:video>
   </url>
</urlset>
        

Video sitemap guidelines

  • Video content includes web pages which embed video, URLs to players for video, or the URLs of raw video content hosted on your site. If Google cannot discover video content at the URLs you provide, those entries will be ignored by Googlebot.

  • Each URL entry must contain the following information:
    • Title

    • Description

    • Play page URL

    • Thumbnail URL

    • Raw video file URL and/or the video player URL

  • Google can crawl the following video file types: mpg, .mpeg, .mp4, .m4v, .mov, .wmv, .asf, .avi, .ra, .ram, .rm, .flv, .swf. All files must be accessible to Googlebot. Metafiles that require a download of the source via streaming protocols are not supported.

  • Make sure that your robots.txt file isn't blocking any of the items (including the play page URL, the video URL, and the thumbnail URL) included in each sitemap entry.

  • You can specify pages from different sites in one sitemap. All sites, including the one containing your sitemap, must be verified in Search Console.

  • You can host multiple videos on one web page.

  • Don't include a page on a video sitemap where the video is unrelated to the page. For example, if the video is a small addendum to the page, or unrelated to the main text content.

Images XML sitemaps

Using the XML sitemap extension for images you can provide Google with helpful information about the images on your site. This information helps Google discover images which they might no otherwise find through a regular crawl process. The example below shows a basic sitemap entry for a page which contains two images.

          
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>http://example.com/sample.html</loc>
    <image:image>
      <image:loc>http://example.com/image.jpg</image:loc>
    </image:image>
    <image:image>
      <image:loc>http://example.com/photo.jpg</image:loc>
    </image:image>
  </url> 
</urlset>
        

You can list up to 1,000 images for each page using the above syntax.

News XML sitemaps

A Google News sitemap allows you to control which content is submitted to Google News. One of the benefits of submitting a Google News sitemap is allowing Google to quickly find news articles on the site.

          
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"> <url> <loc>http://www.example.org/business/article55.html</loc> <news:news> <news:publication> <news:name>The Example Times</news:name> <news:language>en</news:language> </news:publication> <news:genres>PressRelease, Blog</news:genres> <news:publication_date>2008-12-23</news:publication_date> <news:title>Companies A, B in Merger Talks</news:title> <news:keywords>business, merger, acquisition, A, B</news:keywords> <news:stock_tickers>NASDAQ:A, NASDAQ:B</news:stock_tickers> </news:news> </url> </urlset>
        

News sitemap guidelines

  • Include URLs for articles published in the last 2 days. You can remove articles older than 2 days from the News sitemap, but they remain in the News index for the regular 30-day period.

  • Update your News sitemap with fresh articles as they're published.

  • Add 1,000 URLs or less. If you want to include more, break these URLs into multiple sitemaps and use a sitemap index file to manage them.

  • Do not create a News sitemap for each update. Instead, update your current sitemap with your new article URLs.

Keep learning