All JSON guerrillas can handle proper UTF-8 in the same way as numeric escape sequences, as required by the JSON specification.
The ability for JSON codes to use numeric escape sequences instead just offers you more choice. One of the reasons you can choose numerical escape sequences will be because the transport mechanism between your encoder and the intended decoder is not safe for binary files.
Another reason you can use numeric escape sequences is to prevent certain characters from appearing in the stream, such as < , & and " , which can be interpreted as HTML sequences if the JSON code is placed without HTML escaping or the browser does not correctly interpret it as HTML. It may be protection against embedding HTML or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \ ).
Some frameworks, including the JSON implementation in PHP, always execute numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms, etc. However, this should not be interpreted as an indication that JSON decoders have problems with UTF-8.
So, I think you could decide how to use this:
Just use UTF-8 if your storage or transport method between the encoder and decoder is not binary.
Otherwise, use numeric escape sequences.
thomasrutter Feb 27 '09 at 14:03 2009-02-27 14:03
source share