Why is binary serialization faster than XML serialization?

Why is binary serialization considered faster than XML serialization?

+6
c # xml serialization
source share
4 answers

Binary serialization is more efficient because it directly writes raw data and the XML format needs data analysis to generate the correct XML structure, in addition, depending on what data your objects have, XML can have a lot of redundant data.

+8
source share

Consider serializing a double, for example:

  • binary serialization: write 8 bytes from a memory address to a stream

  • binary deserialization: reading the same 8 bytes

  • xml serialization: writing a tag, converting to text, writing a closing tag - almost three times as much I / O and 1000x CPU usage

  • xml deserialization: reading / checking tags, reading the line that it is being parsed, reading / checking the closing tag. a bit more overhead for I / O and a little more for the CPU

+11
source share

Actually, like everyone else , it depends on the data and the serializer .

Usually (although perhaps unreasonably) people mean BinaryFormatter for "binary", but this has a number of disadvantages:

  • adds a lot of type metadata (which all take up space)
  • by default, it includes field names (which can be verbose, especially for automatically implemented properties)

Conversely, xml usually has overhead, such as:

  • adding spaces and input / output
  • the need to analyze tags (which is very expensive)
  • a lot of encoding / decoding text

Of course, xml is easily compressed, adds a processor, but significantly reduces throughput.

But this does not mean that one is faster; I will give you some sample statistics here (with full source code) on which I annotated the serializer base (binary, xml, text, etc.). Look, in particular, at the first two results; it looks like an XmlSerializer trumped BinaryFormatter for each value, while maintaining cross-platform benefits. Of course, protobuff then outperforms the XmlSerializer ; p

These numbers are pretty well tied to ServiceStack tests, here .

 BinaryFormatter *** binary Length: 1314 Serialize: 6746 Deserialize: 6268 XmlSerializer *** xml Length: 1049 Serialize: 3282 Deserialize: 5132 DataContractSerializer *** xml Length: 911 Serialize: 1411 Deserialize: 4380 NetDataContractSerializer *** binary Length: 1139 Serialize: 2014 Deserialize: 5645 JavaScriptSerializer *** text (json) Length: 528 Serialize: 12050 Deserialize: 30558 (protobuf-net v2) *** binary Length: 112 Serialize: 217 Deserialize: 250 
+8
source share

Well, first of all, XML is a bloated format. Each byte that you send in binary form will look like a minimum of 2 or 3 bytes in XML. For example, sending the number "44" in binary format, you only need one byte. In XML, you need an element tag, plus two bytes, to put a number: <N>44</N> , which is a lot more data.
One of the differences is the encoding / decoding time required to process the message. Since binary data is so compact, it will not consume many hours. If binary data is a fixed structure, you can load it directly into memory and access each element from it without the need to analyze / examine the data.
XML is a text format that requires a few more steps to process. Firstly, the format is bloated, so it consumes more memory. In addition, all data is text, and you may need it in binary form, so XML needs to be parsed. This parsing still takes time to process, no matter how fast your code is. ASN.1 is a β€œbinary XML” format that is a good alternative to XML, but which needs to be parsed just like XML. Also, if most of the data you use is textual rather than numeric, then binary formats will not matter much.
Another factor in speed is the overall size of your data. When you simply download and save a 1 KB binary or a 3 KB XML file, you probably won't notice the difference in speed. This is due to the fact that drives of a certain size are used to store data. Up to 4 KB fits easily into most disk blocks. Thus, it does not matter for the disk whether it is necessary to read 1 KB or 3 KB, since it reads the entire 4 KB block. But when the binary is 1 megabyte and the XML is 3 megabytes, you will need to read a lot more blocks on the disk to just read the XML. (Or write it.) And then it even matters if your XML is 3 MB or just 2.99 MB or 3.01 MB.
When transmitted over TCP / IP, most binary data will be UU encoded. With UU encoding, your binary data will grow from 1 byte for every 3 bytes in the data. XML data will not be encoded, so the size difference becomes smaller, so the speed difference becomes smaller. However, binary data will still be faster since encoding / decoding procedures can be real.
Basically, size matters. :-)


But with XML, you have an additional alternative. You can send and store XML in a ZIP file format. Microsoft Office does this with new versions. A Word document is created as an XML file, but saved as part of a large ZIP file. This combines the best of both worlds, because Word documents are mostly text, so the binary format will not increase speed. XML pinning makes data storage and transfer much faster by simply making it binary. Even more interesting, a compressed XML file may be smaller than an uncompressed binary, so zipped XML becomes faster. (But it is cheating, since XML is now binary ...)

+1
source share

All Articles