Sitemaps might be one of the most over emphasized tools in all of SEO.
Whenever someone posts in a forum or group about some sort of problem they are having with their website, whether it be a loss in rankings, pages not getting indexed, pages dropping from the index, suddenly being outranked by a competitor, or just about anything else, someone pops into the thread and asks if they have tried submitting a new sitemap.
๐คฆโโ๏ธ
Somehow they are believed to be the answer to whatever ails you. They aren’t, but they can still be a useful tool.
First, we should cover what XML sitemaps are.
What is a sitemap?
The simplest analogy to use to describe an XML sitemap, not to be confused with an HTML sitemap, is that it is a roadmap (remember those, kids?) to guide search engines and help them discover your website’s most important content. They can also serve to help give context to your website’s overall structure.
Sitemaps can help search engines to discover new content, content deep in a large site, or content that is not easily accessible because of a poor site structure.
In general, they were much more useful decades ago when search crawlers were not as advanced or efficient as they are today.
And that brings us to a few myths about sitemaps.
Common myths about sitemaps
Myth #1: You should include every page in your sitemap
You do not need to, nor should you, include every page on your site. There are a number of pages that you probably want to leave out of your sitemap such as login pages (both admin and user), pages located behind paywalls, pages that have been redirected, and non-canonical pages.
Myth #2: Every site needs a sitemap
This one might be a bit more controversial and some may argue with me, but well-structured small sites (sites with under 50-100,000 pages) really do not need a sitemap.
Crawlers are pretty advanced these days and do a pretty good job of understanding the overall structure of your site and what pages will need their attention more often. That wasn’t always the case and assigning a priority in your sitemap helped a great deal, but Google has stated they no longer pay attention to priorities assigned in sitemaps.
Myth #3: Sitemaps determine what gets indexed
Just because you put a page inside of your sitemap does not guarantee a search engine will index that page. It will be crawled and considered for indexation, but it still may be left out.
Here are a few tips to keep in mind for your sitemaps:
Tips for using XML sitemaps
For today, we are just going to discuss the most common type of sitemaps, page sitemaps. There are also image sitemaps, video sitemaps, and news sitemaps.
Include your sitemap in robots.txt
Make search engines aware of your sitemap by including it in your robots.txt file with this simple line:
Sitemap: http://example.com/sitemap.xml
Obviously, change the location and filename to match your site.
Proper formatting
There is a proper format you should follow for sitemaps. It will consist of XML tags and must be entity-escapted.
The Sitemap must:
- Begin with an opening
<urlset>
tag and end with a closing</urlset>
tag. - Specify the namespace (protocol standard) within the
<urlset>
tag. - Include a
<url>
entry for each URL, as a parent XML tag. - Include a
<loc>
child entry for each<url>
parent tag.
There are additional tags that are optional:<lastmod>
: This is the date of the last modification of the page. It should be in YYYY-MM-DD format.
Going back to 2015, Gary Illyes of Google has stated that they mostly ignore this tag because it is not reliable. Webmasters have done a poor job of keeping it accurate. However, Google is not the only search engine in town, so it doesn’t hurt to keep using this.
<changefreq>
This is an indication of how frequently the page is likely to change. You can use:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
The value of always is for pages that change each time they are accessed. Google has stated that they may crawl a page marked yearly more frequently than that and may decide to crawl pages marked never from time to time.
I would only consider using never on very large complex sites (over ~250,000 pages) where crawl budget may be a concern.
Also, you cannot successfully abuse these. If you mark pages as hourly thinking it will give you an edge in getting things crawled and indexed faster, Google will catch on.
<priority>
Another optional tag. Priority has a range from 0.00 to 1.00, but it is a value that is relative to all other pages. In other words, you cannot just set every page to 1.00.
Here is an example of what a sitemap might look like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.yoursite.com/</loc>
<lastmod>2023-07-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.yoursite.com/products</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.yoursite.com/services</loc>
<lastmod>2023-7-23</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.yoursite.com/blog</loc>
<lastmod>2023-6-23</lastmod>
<priority>0.9</priority>
</url>
</urlset>
Prioritize important pages
This is a tag that Google claims to no longer pay any attention to, but again, they are not the only search engine around.
Priority is a relative value to the other pages on your site. Don’t just give everything a 1.00.
Instead you might use settings such as:
- Homepage – 0.85 (unless it changes frequently)
- Main landing, product, or service pages – 1.00
- Other landing pages – 0.90
- Main navigation pages – 0.80
- Blog category pages – 0.75
- Top pillar articles – 0.80
- Privacy, ToS, etc. – 0.20
- Other pages – 0.60
Include only canonical versions of your pages
Your sitemap should only include the canonical version of a page. If you include non-canonical versions of a page, it is giving conflicting information. You may get coverage errors in Google Search Console, and may even end up with the non-canonical version getting indexed over the canonical version.
Split up large sitemaps
A sitemap cannot exceed 50MB or 50,000 pages. If you have more than 50,000 pages, you will need to split up your sitemap into multiple sitemaps and create a sitemap index.
Sometimes even if you do not exceed those values, it is still a good idea to split up sitemaps. It makes them easier to maintain and audit. I like to have a separate sitemap for a blog from the main website.
On a site I’m working on, we recently launched about 400 templates and are looking at boosting that to a few thousand. They are all going in their own sitemap.
A sitemap index has a slightly different format. Here is an example taken straight from https://www.sitemaps.org/protocol.html:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
Ping Google when you make changes
If you make significant changes to your sitemap or sitemaps, you can ping Google to get them to crawl it. You do it by sending a GET
request in your browser.
Type this into your browser with your sitemap’s URL included to ping Google.
https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP
Name your sitemaps appropriately and/or be sneaky
You do not have to use the standard sitemap.xml name.
Especially on larger sites with multiple sitemaps, it’s a good idea to use descriptive names such as blog_posts1.xml, blog_posts2.xml, main_site.xml, marketing_pages.xml, etc.
Also, if you want to keep from giving competitors an easy way to snoop around your site, get creative with your sitemap names so that they can’t just go to https://yoursite.com/sitemap.xml and find your sitemap.
I’ve used names like:
- hereisallmystuff.xml
- goonies.xml
- map_site_here.xml
- xmarksthespot.xml
You do not have to do this, but on a few sites I’m working on we are doing some creative things to get pages indexed, and I do not want to give competitors any ideas if I can avoid it.
Keep landing pages for ad campaigns out of your sitemap
I see this mistake a lot. Landing pages for ad campaigns should be marked noindex and completely orphaned from your website. They should not be included in a sitemap.