JSON character encoding - supported by UTF-8 browsers or do I need to use numeric escape sequences?

I am writing a web service that uses json to represent its resources, and I am a bit stuck in thinking about the best way to code json. Reading json rfc ( http://www.ietf.org/rfc/rfc4627.txt ) makes it clear that utf-8 is the preferred encoding. But rfc also describes a string escaping mechanism for specifying characters. I assume that this will usually be used to exit characters other than ascii, thereby making the resulting utf-8 valid ascii.

So let's say I have a json string that contains Unicode characters (code points) that are not ascii. Should my web service just encode utf-8 and return it, or should it avoid all these characters without ascii and return pure ascii?

I would like browsers to be able to execute results using jsonp or eval. Does this decision mean? My knowledge of javascript support in the browser for utf-8 is missing.

EDIT: I wanted to clarify that my main concern about how to encode the results really relates to the browser processing the results. What I read indicates that browsers may be encoding sensitive when using JSONP in particular. I did not find any really good information on this, so I will need to start some testing to see what happens. Ideally, I would only like to avoid those few characters that are required, and just utf-8 encode the results.

+67
json web-services unicode utf-8
Feb 24 '09 at 20:57
source share
5 answers

All JSON guerrillas can handle proper UTF-8 in the same way as numeric escape sequences, as required by the JSON specification.

The ability for JSON codes to use numeric escape sequences instead just offers you more choice. One of the reasons you can choose numerical escape sequences will be because the transport mechanism between your encoder and the intended decoder is not safe for binary files.

Another reason you can use numeric escape sequences is to prevent certain characters from appearing in the stream, such as < , & and " , which can be interpreted as HTML sequences if the JSON code is placed without HTML escaping or the browser does not correctly interpret it as HTML. It may be protection against embedding HTML or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \ ).

Some frameworks, including the JSON implementation in PHP, always execute numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms, etc. However, this should not be interpreted as an indication that JSON decoders have problems with UTF-8.

So, I think you could decide how to use this:

  • Just use UTF-8 if your storage or transport method between the encoder and decoder is not binary.

  • Otherwise, use numeric escape sequences.

+63
Feb 27 '09 at 14:03
source share

I had a problem. When I JSON encode a string with a character like "é", each browser returns the same "é" except IE, which will return "\ u00e9".

Then with PHP json_decode () it will fail if it finds "é", so for Firefox, Opera, Safari and Chrome I have to call utf8_encode () before json_decode ().

Note. In my tests, IE and Firefox use their own JSON object, other browsers use json2.js.

+14
Aug 11 '09 at 23:30
source share

ASCII is no longer in it. Using UTF-8 encoding means that you are not using ASCII encoding. What you should use for a shielding mechanism is what the RFC says:

All Unicode characters can be placed in quotation marks, except for characters that must be escaped: quotation mark, solidus inverse, and control characters (U + 0000 through U + 001F)

+10
Feb 24 '09 at 21:03
source share

I had the same problem. It works for me. Please check it.

 json_encode($array,JSON_UNESCAPED_UNICODE); 
+4
Jan 29 '16 at 10:53 on
source share

I had a similar problem with é char ... I think the comment, "it is possible that the text you feed it is not UTF-8" is probably close to the sign here. I feel that the default collation in my instance was something else until I understood and changed it to utf8 ... the problem is that the data was already there, so I'm not sure if it converted the data or not, when I changed it, workbench. The end result is that php will not encode json data, just returns false. No matter which browser you use as your server causing my problem, php will not parse data in utf8 if this char is present. For example, I'm not sure if this is due to the conversion of the circuit to utf8 after the presence of data, or simply with a php error. In this case, use json_encode(utf8_encode($string));

0
Dec 15 '15 at 9:31
source share



All Articles