Buliding an XML Sitemap - Notes

Buliding an XML Sitemap

Building an XML Sitemap

An XML Sitemap feed lists all of the pages on your Web site that you want the search engines to know about. While theoretically the search engines should be able to find all of your pages by following links, it still helps to have it there for completeness and to take advantage of the benefits that the webmaster tools offer.

For SEO purposes, it is essential that you (a) build an XML Sitemap and (b) keep it up-to-date in order to help improve spiderability and ensure that all the important pages on your site are crawled and indexed. XML Sitemaps give the search engines a complete list of the pages you want indexed, along with supplemental information about those pages, including how frequently the pages are updated. This does not guarantee that all pages will be crawled or indexed, but it can help.

It's worth pointing out that an XML Sitemap is different from the standard site map that you include on your site. XML Sitemaps are feeds designed for search engines; they're not for people. They are merely lists of URLs with some optional Meta data about them that is meant to be spidered by a search engine. A site map, on the other hand, is a Web page on your site that is designed to be viewed by visitors and contains links to help them navigate your site.

Sitemaps were designed to help sites that historically could not be crawled by the search engines (sites with dynamic content, Flash or Ajax) get their content spidered and listed in the index. That's not to say that using an XML Sitemap is a way around building a spiderable Web site however, since all it does is hand a list of available URLs to the search engines. When creating a new site, you want to make sure that you are creating it from a sound search engine optimization standpoint. Creating an XML Sitemap will not pass on any link popularity, nor will it help with subject theming.

An XML Sitemap is created using XML (Extensible Markup Language), which is a type of markup language commonly used on the Web where tags can be created to share information. The required XML tags are: <urlset>, <url>, and <loc>. <urlset> and <url> are for formatting the XML, and <loc> is the URL.

Optional Meta data tags are:

  • <lastmod> - last modified date.
  • <changefreq> - how often the page changes (such as hourly, daily, monthly, never).
  • <priority> - how important the page is from 0 (the lowest) to 1 (the highest).

Site owners aren't required to use these tags, but the engines may consult them when deciding how often they should re-crawl pages. Google states in their Webmaster Guidelines that while they take these tags into consideration, they do not base their spidering decisions on and that <priority> does not have any influence on rankings. Use these tags accurately to help the search engines spider your site more intelligently. Pages that you are optimized should be set to a higher priority. If you have archived pages that haven't been updated in years, then they can be set to a low priority with a <changefreq> of "never".

An XML Sitemap listing for a URL looks like this:

 
  
< urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
< url>
< loc>http://www.bruceclay.com/</loc>
< lastmod>2008-01-01</lastmod>
< changefreq>monthly</changefreq>
< priority>1.0</priority>
</url>
</urlset>

If you don't want to have to type that out for each of your site's pages, fear not. There are quite a few Sitemap Generators which will spider your site and build it for you. Some of our favorites include:

Be careful to set up the Sitemap Generator tool properly to avoid spidering pages you do not want indexed.

For very large Web sites, your XML Sitemap feed should be broken up into multiple files as Google has set a limit of 50,000 URLs and a file size of 10MB. Once you have created the Sitemap file(s), upload it to the root of your Web site (ie http://www.your-domain-name.com/sitemap.xml). Once this is done it's time to let the search engine know about it. One way you can do that is to specify your XML Sitemap in your robots.txt file by simply putting "sitemap:" and the URL. It should look something like this:

User-agent: *
sitemap: http://www.your-domain-name.com/sitemap.xml

Google, Yahoo! and MSN also offer other engine-specific ways for you to alert them to your XML Sitemap feed.

Google:

You can submit your Sitemap through Google Webmaster Tools. This will allow you to see when Google last downloaded your Sitemap and any errors that may have occurred. Once you have validated your site, you can also view information such as Web Crawl Errors (including pages that were not found or timed out), Statistics (crawl rate, top search queries, etc.), and External and Internal Links. There are also other useful tools like the robots.txt analyzer.

Google Webmaster Tools is an incredibly valuable source of information that can help diagnose potential problems and allows you a glimpse into the way Google views your Web site. Google now offers specialized Sitemaps for Video, Mobile, News, and Code Search. These allow you to tell Google about news articles, videos, pages designed for mobile devices and publicly accessible source code on your Web site.

MSN:

MSN has launched its own webmaster tools called Live Search Webmaster Center. Similar to, but not as robust as Google's, it allows you to add your XML Sitemap feed and once your site validated you can view information about your Web site. The information is currently limited to whether pages are blocked, top links, and robots.txt validation.

Yahoo!:

You can submit your SML Sitemaps feed through Yahoo!'s Site Explorer by simply entering in the URL.

Yahoo! made Sitemaps a little more confusing by introducing their own version which uses text files called urllist.txt. Many of the Sitemaps Generators will also build an urllist.txt file with the XML Sitemap feed. Since Yahoo! also recognizes Sitemaps, you might as well just stick to that and avoid having to update two files.

Ask:

Ask supports Sitemaps but requires that you add it to your robots.txt file for them to find it.

There are many benefits to creating an XML Sitemap. If you launch a new site, issue a redesign or perform a large update, Sitemaps are a good way to alert the search engine to the new pages and potentially get these pages indexed sooner. Another benefit to Sitemaps is the webmaster tools Google and MSN have built around them. These tools can give you valuable information about how the search engines see your site and help diagnose any potential problems that could hinder your rankings.

Once you have created your XML Sitemap and let the search engines know about it, make sure to keep it up-to-date. If you add or remove a page, make sure your Sitemap reflects that. You should also check Google Webmaster Tools frequently to ensure that Google is not finding any errors in your Sitemap.

The goal of XML Sitemaps is to help search engines crawl smarter. So help them by using these tags appropriately to help them understand how to best crawl your site. You can find more information about the Sitemaps protocol and XML schema at http://www.sitemaps.org.

 
Buliding an XML Sitemap