What is CharsetDecoder.decode (ByteBuffer, CharBuffer, endOfInput)

I have a problem with the CharsetDecoder class.

First code example (which works):

  final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder(); final ByteBuffer b = ByteBuffer.allocate(3); final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char € for (int i=0; i<tab.length; i++){ b.put(tab, i, 1); } try { b.flip(); System.out.println("a" + dec.decode(b).toString() + "a"); } catch (CharacterCodingException e1) { e1.printStackTrace(); } 

Result a€a

But when I execute this code:

  final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder(); final CharBuffer chars = CharBuffer.allocate(3); final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char € for (int i=0; i<tab.length; i++){ ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1); dec.decode(buffer, chars, i == 2); } dec.flush(chars); System.out.println("a" + chars.toString() + "a"); 

Result a

Why not the same result?

How to use decode(ByteBuffer, CharBuffer, endOfInput) method decode(ByteBuffer, CharBuffer, endOfInput) for the CharsetDecoder class to get the result a€a ?

- EDIT -

So, with Jesper code, I do this. This is not perfect, but works with step = 1, 2 and 3

 final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder(); final CharBuffer chars = CharBuffer.allocate(6); final byte[] tab = new byte[]{(byte)97, (byte)-30, (byte)-126, (byte)-84, (byte)97, (byte)97}; //char € final ByteBuffer buffer = ByteBuffer.allocate(10); final int step = 3; for (int i = 0; i < tab.length; i++) { // Add the next byte to the buffer buffer.put(tab, i, step); i+=step-1; // Remember the current position final int pos = buffer.position(); int l=chars.position(); // Try to decode buffer.flip(); final CoderResult result = dec.decode(buffer, chars, i >= tab.length -1); System.out.println(result); if (result.isUnderflow() && chars.position() == l) { // Underflow, prepare the buffer for more writing buffer.position(pos); }else{ if (buffer.position() == buffer.limit()){ //ByteBuffer decoded buffer.clear(); buffer.position(0); }else{ //a part of ByteBuffer is decoded. We keep only bytes which are not decoded final byte[] b = buffer.array(); final int f = buffer.position(); final int g = buffer.limit() - buffer.position(); buffer.clear(); buffer.position(0); buffer.put(b, f, g); } } buffer.limit(buffer.capacity()); } dec.flush(chars); chars.flip(); System.out.println(chars.toString()); 
+5
source share
1 answer

The decode(ByteBuffer, CharBuffer, boolean) method decode(ByteBuffer, CharBuffer, boolean) returns the result, but you ignore the result. If you print the result in the second code fragment:

 for (int i = 0; i < tab.length; i++) { ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1); System.out.println(dec.decode(buffer, chars, i == 2)); } 

you will see this output:

 UNDERFLOW MALFORMED[1] MALFORMED[1] aa 

It seems to work incorrectly if you start to decode in the middle of a character. The decoder expects the first thing to read is the start of a valid UTF-8 sequence.

edit - When the decoder reports UNDERFLOW , it expects you to add more data to the input buffer, and then try calling decode() again, but you must re-offer it the data from the beginning of UTF-8, which you are trying to decode. You cannot continue in the middle of a UTF-8 sequence.

Here is a version that works by adding one byte from tab in each iteration of the loop:

 final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder(); final CharBuffer chars = CharBuffer.allocate(3); final byte[] tab = new byte[]{(byte) -30, (byte) -126, (byte) -84}; //char € final ByteBuffer buffer = ByteBuffer.allocate(10); for (int i = 0; i < tab.length; i++) { // Add the next byte to the buffer buffer.put(tab[i]); // Remember the current position final int pos = buffer.position(); // Try to decode buffer.flip(); final CoderResult result = dec.decode(buffer, chars, i == 2); System.out.println(result); if (result.isUnderflow()) { // Underflow, prepare the buffer for more writing buffer.limit(buffer.capacity()); buffer.position(pos); } } dec.flush(chars); chars.flip(); System.out.println("a" + chars.toString() + "a"); 
+2
source

All Articles