I am trying to encode and decode a UTF8 string for base64. Theoretically, not a problem, but when decoding, they never output the correct characters, but ?.
String original = "ุฎูุนุณูุจูุชุง"; B64encoder benco = new B64encoder(); String enc = benco.encode(original); try { String dec = new String(benco.decode(enc.toCharArray()), "UTF-8"); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.println("Original: " + original); prtHx("ara", original.getBytes()); out.println("Encoded: " + enc); prtHx("enc", enc.getBytes()); out.println("Decoded: " + dec); prtHx("dec", dec.getBytes()); } catch (UnsupportedEncodingException e) { e.printStackTrace(); }
The console output is as follows:
Original: ุฎูุนุณูุจูุชุง
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
Coded: Pz8 / Pz8 / Pz8 /
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
Decoded: ?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
thank you just write the hexadecimal value of the bytes in the output file. Am I doing something obviously wrong here?
Andreas pointed out the correct solution, emphasizing that the getBytes () method uses the standard platform encoding (Cp1252), although the source file itself is UTF-8. Using getBytes ("UTF-8"), I was able to notice that the bytes encoded and decoded were actually different. further research showed that the encoding method uses getBytes (). Changing this did the trick beautifully.
try { String enc = benco.encode(original); String dec = new String(benco.decode(enc.toCharArray()), "UTF-8"); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.println("Original: " + original); prtHx("ori", original.getBytes("UTF-8")); out.println("Encoded: " + enc); prtHx("enc", enc.getBytes("UTF-8")); out.println("Decoded: " + dec); prtHx("dec", dec.getBytes("UTF-8")); } catch (UnsupportedEncodingException e) { e.printStackTrace(); }
System Coding Cp1252
Original: ุฎูุนุณูุจูุชุง
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
Decoded: ุฎูุนุณูุจูุชุง
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Thanks.