Serializing an object for a string: why does my encoding add stupid characters?

I need to get a serialized XML representation of an object as a string. I am using XmlSerializer and memoryStream for this.

XmlSerializer serializer = new XmlSerializer(typeof(MyClass)); using (MemoryStream stream = new MemoryStream()) { using (XmlTextWriter writer = new XmlTextWriter(stream,Encoding.UTF8)) { serializer.Serialize(writer, myClass); string xml = Encoding.UTF8.GetString(stream.ToArray()); //other chars may be added from the encoding. xml = xml.Substring(xml.IndexOf(Convert.ToChar(60))); xml = xml.Substring(0, (xml.LastIndexOf(Convert.ToChar(62)) + 1)); return xml; } } 

Now just look at the xml.substring lines for a moment. What I find is that (even I thought I was pointing the encoding to XmlTextWriter and to GetString (and I use memoryStream.ToArray (), so I only work with data in the stream buffer) .. As a result, in the line xml added an invalid character other than xml. In my case, โ€œ?โ€ is at the beginning of the line. That's why I am a substring for '<' and '>' to ensure that I "I only get good things.

Strange, looking at this line in the debugger (Text Visualizer), I do not see this ??. Only when I paste what's in the visualizer into a notebook or the like.

So, while the above code (substring, etc.) does the job, what actually happens here? Is some unsigned byte thing included and not represented in the text renderer?

+4
source share
2 answers

You can exclude the specification by specifying a specific encoding, i.e. instead of Encoding.UTF8 , try using:

 using (MemoryStream stream = new MemoryStream()) { var enc = new UTF8Encoding(false); using (XmlTextWriter writer = new XmlTextWriter(stream,enc)) { serializer.Serialize(writer, myClass); } string xml = Encoding.UTF8.GetString( stream.GetBuffer(), 0, (int)stream.Length); } 
+8
source

What you are looking at is a byte sign (specification) . This is normal in UTF8!

In short, for my commentary fans: they are byte markers that define the essence of the string.

What can you do, or use a) ASCII as your encoding, which will discard byte bikes .. or b) why not leave them? They are really useful for your XML string.

Mark Gravell, below, provides a third alternative by creating your own encoding object and specifying false in the constructor to suppress byte order markers.

+6
source

All Articles