UTF Encoding for Chinese Characters

Question

UTF Encoding for Chinese Characters

I get a String through an object from the axis web service. Since I do not get the string that I expected, I did a check by converting the string to bytes, and I get C3A4C2 BDC2A0 C3A5C2 A5C2BD C3A5C2 90C297 in hexa, when I expect E4BDA0 E5A5BD E59097, which is actually 你好吗 in UTF- 8.

Any ideas that 你好吗 might trigger, become C3A4C2 BDC2A0 C3A5C2 A5C2BD C3A5C2 90C297? I did a google search, but all I had was a Chinese site describing the problem that python is having. Any ideas would be great, thanks!

+5

java encoding utf

Maurice Jul 27 '11 at 1:20

source share

1 answer

Ray Toal · Accepted Answer · 2011-07-27T01:24:41+0000

You have the so-called double encoding.

"你好吗", , UTF-8 E4BDA0 E5A5BD E59097.

THAT UTF-8. E4. UTF-8? ! C3 A4!

....: -)

Java, :

public class DoubleEncoding {
    public static void main(String[] args) throws Exception {
        byte[] encoding1 = "你好吗".getBytes("UTF-8");
        String string1 = new String(encoding1, "ISO8859-1");
        for (byte b : encoding1) {
            System.out.printf("%2x ", b);
        }
        System.out.println();
        byte[] encoding2 = string1.getBytes("UTF-8");
        for (byte b : encoding2) {
            System.out.printf("%2x ", b);
        }
        System.out.println();
    }

}

UTF Encoding for Chinese Characters

More articles: