EPiServer: Dynamic SiteMap.xml generator for Google

by Brad 12. June 2008 10:09

Having stumbled across Google's Webmaster Tools a while ago it's been in the back of my mind to create a custom HttpHandler to dynamically create a (virtual) SiteMap.xml file that Google (and other search engines) can use as a reference when spidering my sites.

I thought I'd share the basic implantation as a starting point. Ideally you'd add a property to each EPiServer PageType page for "change frequency" and "priority" which the sitemap generator would then use, but for this basic version I've simply set the homepage to 1.0 (the maximum priority) with a daily change, and all other pages to 0.6 and weekly.

First declare the class and the required members:

public class SearchEngineSiteMap : IHttpHandler
{
    bool IHttpHandler.IsReusable
    {
        get { return true; }
    }
    void IHttpHandler.ProcessRequest(HttpContext context)
    {
        GenerateSiteMap(context);
    }

Next we need to configure the output stream, create an XmlTextWriter and the outer Xml block:

/// <summary>
/// Generate the SiteMap
/// </summary>
/// <param name="context"></param>
private void GenerateSiteMap(HttpContext context)
{
    //Set the response information
    context.Response.Expires = -1;
    context.Response.ContentType = "application/xml";
    Encoding encoding = new UTF8Encoding();
    context.Response.ContentEncoding = encoding;

    //Create an XMLTextWriter to build the XML, passing it the context's outputstream
    XmlTextWriter xmlTextWriter = new XmlTextWriter(context.Response.OutputStream, encoding);
    xmlTextWriter.Formatting = Formatting.Indented;
    xmlTextWriter.WriteStartDocument();

    //Write the root xml element
    xmlTextWriter.WriteStartElement("urlset");
    xmlTextWriter.WriteStartAttribute("xmlns");
    xmlTextWriter.WriteValue("http://www.sitemaps.org/schemas/sitemap/0.9");
    xmlTextWriter.WriteEndAttribute();

    //Get EPiServer's StartPage (not the RootPage!)
    PageData p = EPiServer.DataFactory.Instance.GetPage(PageReference.StartPage);

    //SiteMaps can only contain unique urls so maintain a list of added urls
    List<string> alreadyAddedUrls = new List<string>();

    //Now call recursive method to populate every published/visible etc page
    RenderNodesToSiteMap(
        context,
        xmlTextWriter,
        alreadyAddedUrls,
        p);

    //Close the root element
    xmlTextWriter.WriteEndElement();
    //end of document
    xmlTextWriter.WriteEndDocument();
    //finally close the XMLTextWriter
    xmlTextWriter.Close();
}

Finally we need to add a method that will be recursively called for each published page in the site:

/// <summary>
/// Recursively converts the given page into XML for use in the sitemap.
/// </summary>
/// <param name="context">Current Context</param>
/// <param name="xmlTextWriter">XMLTextWriter to write give page (p) to</param>
/// <param name="alreadyAddedUrls">List of Urls already added to the SiteMap</param>
/// <param name="p">The page to add to the sitemap</param>
private void RenderNodesToSiteMap(
    HttpContext context,
    XmlTextWriter xmlTextWriter,
    List<string> alreadyAddedUrls,
    PageData p)
{
    //Make sure the page is published
    if (PageDataUtilities.IsPagePublished(child))
    {
        //Get the page's 'Friendly' URL
        string url = PageDataUtilities.GetFriendlyUrl(p, true);

        // Make sure this URL is not in the XML already
        if (!alreadyAddedUrls.Contains(url))
        {
            //Add it ready to check later
            alreadyAddedUrls.Add(url);
            //Write the Url element
            xmlTextWriter.WriteStartElement("url");
            //Add the location (Url) attribute - making sure its encoded!
            xmlTextWriter.WriteElementString("loc", HttpUtility.HtmlEncode(url));
            //Add when it was last modified
            xmlTextWriter.WriteElementString(
                "lastmod",
                p.Changed.ToString("u", CultureInfo.InvariantCulture).Replace(" ", "T"));
            //If its the StartPage set the change frequency to daily
            //and the priority to 1
            if (p.PageLink == PageReference.StartPage)
            {
                xmlTextWriter.WriteElementString(
                "changefreq",
                "daily");
                xmlTextWriter.WriteElementString(
                    "priority",
                    "1.0");
            }
            else //Otherwise weekly and a lower priority
            {
                xmlTextWriter.WriteElementString(
                "changefreq",
                "weekly");
                xmlTextWriter.WriteElementString(
                 "priority",
                 "0.6");
            }
            //Close the URL node
            xmlTextWriter.WriteEndElement();
        }
        //Now loop through all the 
        foreach (PageData child in EPiServer.DataFactory.Instance.GetChildren(p.PageLink))
        {

            RenderNodesToSiteMap(
                context,
                xmlTextWriter,
                alreadyAddedUrls,
                child);

        }
    }

With all that done the last thing is to register it in the Web.config (inside the System.Web element) as follows:

<httpHandlers>
  ...
  <add 
    path="sitemap.xml" 
    verb="*" 
    type="MyLibrary.SearchEngineSiteMap, MyLibrary" />
  ...
</httpHandlers>

Tags:

ASP.NET | C# | EPiServer

Comments

6/12/2008 1:51:44 PM #

Brad,

Question about the HttpHandler you've set up. Wouldn't this slow down the whole site because for all page requests, since every request now re-generates the site map?

On a multi-user site, would there be problems with concurrent writes to the file and duplicated work.

Would a better place be in the Session_Start or Application_Start ?

Dominic

Dominic Zukiewicz |

11/3/2008 1:17:05 PM #

There are some code errors in the code, depending on how you look at it.

[code]if (PageDataUtilities.IsPagePublished(child))
[/code]

[code]
string url = PageDataUtilities.GetFriendlyUrl(p, true);
[/code]

I assumme PageDataUtilities is an internal class in your project for doing PageData operations, so i simply replaced that with my own implementation. The "child" variable, i assumed was a typo, and that you meant the "p" variable instead.

With those changes it works, But you might want to update the article to reflect that, as it doesn't compile as is. Great code example otherwise Smile

Johan |

4/20/2009 4:58:37 AM #

buy http://www.thplay.com/">wow gold,buy http://www.thewowgold.net/">wow goldworld of warcrft gold.

wow gold |

1/14/2010 1:30:39 AM #

http://www.watchesview.com">rolex watches and http://www.watchesview.com/tag_heuer-watches.html">tag heuer come in different http://www.watchesview.com/patek_philippe-watches.html">patek philippe shapesand sizes. There are many watches to choose from http://www.watchesview.com/audemars_piguet-watches.html">audemars piguet when you need to buy one. When choosing which watches to wear http://www.watchesview.com/breitling-watches.html">breitling watches it is http://www.watchesview.com/chanel-watches.html">chanel watches fun and interesting http://www.watchesview.com/montblanc-watches.html">montblanc watches to try on different http://www.watchesview.com/cartier-watches.html">cartier watches choices and http://www.watchesview.com/panerai-watches.html">panerai watches find the one that fits your style the best

rolex |

1/14/2010 1:32:44 AM #

what http://www.replicame.com" rel="nofollow">http://www.replicame.com">replica watches makes a fine http://www.replicame.com" rel="nofollow">http://www.replicame.com">fake watches watch.The standard watches offer http://www.replicame.com" rel="nofollow">http://www.replicame.com/tag_heuer-watches.html">tag heuer the function http://www.replicame.com" rel="nofollow">http://www.replicame.com/breitling-watches.html">breitling watches of telling http://www.replicame.com" rel="nofollow">http://www.replicame.com/rolex-watches.html">replica rolex time. No harm in choosing this type http://www.replicame.com" rel="nofollow">http://www.replicame.com/cartier-watches.html">cartier watches because it http://www.replicame.com" rel="nofollow">http://www.replicame.com/omega-watches.html">omega watches is the least expensive http://www.watchesday.com">replica watches yet quite functional type. It does itshttp://www.seawatches.com">replica watches job after all!Another http://www.watchesday.com/tag_heuer-watches.html">tag heuer feature that is nice http://www.seawatches.com/tag_heuer-watches.html">tag heuer is waterproof.

rolex |

Powered by BlogEngine.NET 1.5.0.7
Theme by Interakting

Interakting

A full service digital agency offering online strategy, design and usability, systems integration and online marketing services that deliver real business benefits and ensure your online objectives are met.

Calendar

<<  February 2012  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
2728291234
567891011

View posts in large calendar