Use CDATA to store raw binary streams?

Instead of overhead with saving the binary as Base64 , I was wondering if you can directly store double-byte binary streams in XML files using CDATA or commenting on this or something else?

+6
xml binary cdata base64
source share
4 answers

You can save it as CDATA, but there is a risk that some byte sequences will be evaluated in valid XML, which closes the CDATA section. After a quick look at http://www.w3.org/TR/2006/REC-xml-20060816/#sec-cdata-sect it seems that you can have any sequence of characters except "]]>". See what a valid XML char is .

-one
source share

The Nul character ('\ 0' in C) is not valid anywhere in XML, even as escape (and # 0;).

+11
source share

No, you cannot use CDATA to embed binary data in an XML file.

In XML1.0 (since XML 1.1 is more permissive, but not about control characters), the following restrictions apply to CDATA characters:

CData ::= (Char* - (Char* ']]>' Char*)) Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 

This means that there are several illegal characters, including:

  • illegal XML control characters from 0x00 to 0x20, except newlines, carriage returns, and tabs
  • UTF-8 illegal sequences such as 0xFF or noncanonical 0b1100000x 0b10xxxxxx

In addition to this, in a standard object without CDATA:

  • "<" and ">" are illegal
  • "&" usage is limited ( &eacute; OK, &zajdalkdza; no)

Thus, CDATA is just a way to resolve "<", ">" and "&" by restricting it to β€œ]]>" instead. It does not solve the problem with illegal XML, Unicode and UTF-8, which is the main problem.

Solutions:

  • Use Base64 with 33% overhead, but great support in all programming languages ​​and the fact that this is standard
  • Use BaseXML with still limited implementations, but with only 20% overhead
  • Do not encode binary data in XML; if possible, translate it separately
+7
source share

XML is a text format - do not use it to store binary data. Put the binary blobs in separate files and add an element to your XML that references these files. If you want to keep all binary drops in a single file, add an offset attribute or something like that ...

+4
source share

All Articles