Serializing an XML object containing invalid characters

I am serializing an object containing HTML data in a String property.

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject)) Dim fs As New FileStream(FilePath, FileMode.Create) Formatter.Serialize(fs, Ob) fs.Close() 

But when I read the XML back to the object:

 Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject)) Dim fs As New FileStream(FilePath, FileMode.Open) Dim Ob = CType(Formatter.Deserialize(fs), MyObject) fs.Close() 

I get this error:

 "'', hexadecimal value 0x14, is an invalid character. Line 395, position 22." 

Should this .NET error be prevented by avoiding invalid characters?

What is going on here and how can I fix it?

+4
source share
4 answers

At the serialization stage, this should really fail, because 0x14 is an invalid value for XML . This cannot be avoided, even with  , since it is excluded as a valid character from the XML model. I am really surprised that the serializer resolves this, as this makes the serializer inappropriate.

Is it possible to remove invalid characters from a string before it is serialized? For what purpose do you have 0x14 in HTML?

Or is it possible that you write one encoding and read another?

+2
source

I set the CheckCharacters XmlReaderSettings property to false. I would advise doing this if you yourself serialized the data through the XmlSerializer. If it is from an unknown source, then this is not a good idea.

 public static T Deserialize<T>(string xml) { var xmlReaderSettings = new XmlReaderSettings() { CheckCharacters = false }; XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings); XmlSerializer xs = new XmlSerializer(typeof(T)); return (T)xs.Deserialize(xmlReader); } 
+6
source

You really have to publish the class code that you are trying to serialize and deserialize. In the meantime, I will make an assumption.

Most likely, an invalid character is in a field or property of type string . You will need to serialize this as an array of bytes, assuming that you cannot escape the presence of this character at all:

 [XmlRoot("root")] public class HasBase64Content { internal HasBase64Content() { } [XmlIgnore] public string Content { get; set; } [XmlElement] public byte[] Base64Content { get { return System.Text.Encoding.UTF8.GetBytes(Content); } set { if (value == null) { Content = null; return; } Content = System.Text.Encoding.UTF8.GetString(value); } } } 

This causes the XML to look like this:

 <?xml version="1.0" encoding="utf-8"?> <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Base64Content>AAECAwQFFA==</Base64Content> </root> 

I see that you will probably prefer VB.NET:

 ''# Prettify doesn't like attributes as the first item in a VB code block, so this comment is here so that it looks right on StackOverflow. <XmlRoot("root")> _ Public Class HasBase64Content Private _content As String <XmlIgnore()> _ Public Property Content() As String Get Return _content End Get Set(ByVal value As String) _content = value End Set End Property <XmlElement()> _ Public Property Base64Content() As Byte() Get Return System.Text.Encoding.UTF8.GetBytes(Content) End Get Set(ByVal value As Byte()) If Value Is Nothing Then Content = Nothing Return End If Content = System.Text.Encoding.UTF8.GetString(Value) End Set End Property End Class 
+1
source

I would handle .NET for this, but you can also look at the XmlSerializer and XmlReaderSettings class (see an example general method below):

 public static T Deserialize<T>(string xml) { var xmlReaderSettings = new XmlReaderSettings() { ConformanceLevel = ConformanceLevel.Fragment, ValidationType = ValidationType.None }; XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings); XmlSerializer xs = new XmlSerializer(typeof(T), ""); return (T)xs.Deserialize(xmlReader); } 

I would also like to check if there are encoding problems in the code code (Unicode, UTF8, etc.). The hexadecimal value 0x14 is not the char you expect in XML :)

0
source

All Articles