How to use a CDATA section with an XmlSerializer

by Dominic Zukiewicz 2. February 2010 22:29

Recently, I’ve had to interpret some user input and then place this input into an XML file for processing by BizTalk Server 2006. Unfortunately, BizTalk Server 2006 likes you to encode characters using their XML equivalents. Let me explain..

Background

This can seem quite easy using the System.Xml.XmlSerializer, with its ability to automatically generate XML and escape invalid characters for us. There are problems though.

Here is a template class:

  1. public class TestClass
  2. {
  3.     public string Element1 { get; set; }
  4.     public string Element2 { get; set; }
  5.     public string Element3 { get; set; }
  6. }

Now some code to test it being serialized:

  1. TestClass tc = new TestClass();
  2. tc.Element1 = "Hello World!";
  3. tc.Element2 = "Yo yo yo yo !";
  4. tc.Element3 = "And some more text here!";
  5.  
  6. XmlSerializer serializer = new XmlSerializer(typeof(TestClass));
  7. XmlWriter writer = XmlWriter.Create("UTF-8.xml");
  8.  
  9. serializer.Serialize(writer, tc);

And what do we get?

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <TestClass> <!-- Schemas removed for clarity -->
  3.   <Element1>Hello World!</Element1>
  4.   <Element2>Yo yo yo yo !</Element2>
  5.   <Element3>And some more text here!</Element3>
  6. </TestClass>

 

Problem

So what is the problem? Well, lets see how it copes with characters that XML would prefer are encoded – like &, £ and “

  1. TestClass tc = new TestClass();
  2. tc.Element1 = "Hello World!";
  3. tc.Element2 = "Yo yo yo yo !";
  4. tc.Element3 = "\"£$%^&*^%$£\"£(£^^%&\"^%£$£\"$%";

Our resulting XML?

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <TestClass> <!-- Schema removed for clarity -->
  3.   <Element1>Hello World!</Element1>
  4.   <Element2>Yo yo yo yo !</Element2>
  5.   <Element3>"£$%^&amp;*^%$£"£(£^^%&amp;"^%£$£"$%</Element3>
  6. </TestClass>

Good work .NET! But what does BizTalk Server 2006 say if you try and interpret this XML?

There is no Unicode byte order mark. Cannot switch to Unicode.

Okay… After checking the file in EditPad, and switching to hex mode, the Byte Order Mark was definitely in the file. Well, why don’t I just use the System.Web.HttpUtility.HtmlEncode() method to encoding everything and hopefully sort this problem out. We’ll have to make some changes to the TestClass class:

  1. public class TestClass
  2. {
  3.     public string Element1 { get; set; }
  4.     public string Element2 { get; set; }
  5.  
  6.     [XmlIgnore()]
  7.     public string Element3 { get; set; }
  8.  
  9.     [XmlElement("Element3")]
  10.     public string EncodedElement3
  11.     {
  12.         get
  13.         {
  14.             return System.Web.HttpUtility.HtmlEncode(this.Element3);
  15.         }
  16.         set
  17.         {
  18.             this.Element3 = value;
  19.         }
  20.     }
  21. }

[XmlIgnoreAttribute()] flags the XmlSerializer to not serialize the public property. Instead, we want it to serialize a different public property, but by using [XmlElementAttribute()] to override the name of it

That should sort out any invalid characters right? We are encoding the text before passing it through the XmlSerializer, so what is the result? Using a smaller string of the above example, I get:

  1. <Element3>&amp;quot;&amp;#163;$%^&amp;amp;*</Element3>

Is this right? No! &quot; has turned into &amp;quot;&#163; became &amp;#163; IE8 says this is interpreted as:

<Element3>&quot;&#163;$%^&amp;*</Element3>

When we should be seeing this (from IE8’s rendering):

<Element3>"$%^&*</Element3>

Now, even if we do put <![CDATA[ …… ]]> ourselves, the XmlSerializer is still none the wiser and you can end up with even worse results:

<Element3>&amp;lt;![CDATA[&amp;quot;&amp;#163;$%^&amp;amp;*]]&amp;gt;</Element3>

 

Solution

The easiest way to get round this is to add an extra property, like the above example, for the sole purpose of serialization. But instead of using the return type of string, you should use System.Xml.XmlCDataSection instead, like this:

  1. [XmlIgnore()]
  2. public string Element3 { get; set; }
  3.  
  4. [XmlElementAttribute("Element3")]
  5. public XmlCDataSection CDataElement
  6. {
  7.     get
  8.     {
  9.         XmlDocument xmlDox = new XmlDocument();
  10.         return xmlDox.CreateCDataSection(this.Element3);
  11.     }
  12.     set
  13.     {
  14.         this.Element3 = value.Value;
  15.     }
  16. }

Now, if you try this and see the results:

<Element3><![CDATA["£$%^&*]]></Element3>

Perfect! And the good thing is BizTalk Server 2006 interprets the content correctly, so any split functions or substrings work as expected!

Tags:

Xml | .Net Framework

Powered by BlogEngine.NET 1.5.0.7
Theme by Interakting

Interakting

A full service digital agency offering online strategy, design and usability, systems integration and online marketing services that deliver real business benefits and ensure your online objectives are met.

Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar