Well, I found some interesting examples, although not a good solution yet.
One obvious thing I've tried is to play with encodings. There are two obvious things that should really work:
- Latin-1 (also known as ISO-8859-1): This is a single-byte encoding matching one-on-one with Unicode. Therefore, theoretically, this should be enough to declare the content type "text / plain; charset = ISO-8859-1" and get the character per byte. Alas, due to the idiotic logic of browsers (and even the more idiotic mandate for HTML 5!), Some transcoding takes place, which in a strange way changes the range of control characters (codes 128 - 159). Apparently, this is due to the mandatory assumption that the encoding is really Windows-1252 (why? For some silly reasons .. but that's what it is)
- UCS-2 is a 2-byte fixed-length encoding that preceded UTF-17; and just breaks 16-bit character codes into 2 bytes. Alas, browsers do not seem to support it.
- UTF-16 can work theoretically, but there is a problem of surrogate pair characters (0xD800 - 0xDFFF) that are reserved. And if byte pairs that encode these characters are included, corruption occurs.
However: it looks like the conversion for Latin-1 may be reversible, and if so, I'm sure I could use it in the end. All mutations range from 1 byte (0x00 - 0xFF) to larger bytes, and there are no ambiguous comparisons, at least for Firefox. If this is true for other browsers, it will be possible to display the values ββback and remove the negative effects of automatic transcoding. And this will work for several browsers, including IE (with a warning about the need for something special to handle null values).
Finally, some useful links for data type conversions:
StaxMan Sep 14 '10 at 1:11 2010-09-14 01:11
source share