Unicode base 64 encoding with java

Question

Unicode base 64 encoding with java

I am trying to encode and decode a UTF8 string for base64. Theoretically, not a problem, but when decoding, they never output the correct characters, but ?.

String original = "خهعسيبنتا"; B64encoder benco = new B64encoder(); String enc = benco.encode(original); try { String dec = new String(benco.decode(enc.toCharArray()), "UTF-8"); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.println("Original: " + original); prtHx("ara", original.getBytes()); out.println("Encoded: " + enc); prtHx("enc", enc.getBytes()); out.println("Decoded: " + dec); prtHx("dec", dec.getBytes()); } catch (UnsupportedEncodingException e) { e.printStackTrace(); }

The console output is as follows:

Original: خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
Coded: Pz8 / Pz8 / Pz8 /
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
Decoded: ?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F

thank you just write the hexadecimal value of the bytes in the output file. Am I doing something obviously wrong here?

Andreas pointed out the correct solution, emphasizing that the getBytes () method uses the standard platform encoding (Cp1252), although the source file itself is UTF-8. Using getBytes ("UTF-8"), I was able to notice that the bytes encoded and decoded were actually different. further research showed that the encoding method uses getBytes (). Changing this did the trick beautifully.

 try { String enc = benco.encode(original); String dec = new String(benco.decode(enc.toCharArray()), "UTF-8"); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.println("Original: " + original); prtHx("ori", original.getBytes("UTF-8")); out.println("Encoded: " + enc); prtHx("enc", enc.getBytes("UTF-8")); out.println("Decoded: " + dec); prtHx("dec", dec.getBytes("UTF-8")); } catch (UnsupportedEncodingException e) { e.printStackTrace(); }

System Coding Cp1252
Original: خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
Decoded: خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7

Thanks.

+4

java base64 unicode utf-8

emt14 Apr 18 '11 at 9:42

source share

1 answer

Andreas_D · Accepted Answer · 2011-04-18T09:49:59+0000

String#getBytes() encodes characters using the default platform encoding. The actual encoding of the string literal "خهعسيبنتا" "defined" in the java source file (you select the character encoding when creating or saving the file).

This may be the reason why ara is encoded at 0x3f bytes ..

Try:

 out.println("Original: " + original); prtHx("ara", original.getBytes("UTF-8")); out.println("Encoded: " + enc); prtHx("enc", enc.getBytes("UTF-8")); out.println("Decoded: " + dec); prtHx("dec", dec.getBytes("UTF-8"));

Unicode base 64 encoding with java

More articles: