Unknown bytes are returned by getBytes ().



import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class Main {
 public static void main(String[] args)
 {
  try 
  {
   String s = "s";
   System.out.println( Arrays.toString( s.getBytes("utf8") ) );
   System.out.println( Arrays.toString( s.getBytes("utf16") ) );
   System.out.println( Arrays.toString( s.getBytes("utf32") ) );
  }  
  catch (UnsupportedEncodingException e) 
  {
   e.printStackTrace();
  }
 }
}

Console:


[115]
[-2, -1, 0, 115]
[0, 0, 0, 115]

What is it?

[- 2, -1] - ???

In addition, I noted that if I do this:


String s = new String(new char[]{'\u1251'});
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );

Console:


[-31, -119, -111]
[-2, -1, 18, 81]
[0, 0, 18, 81]
+5
source share
4 answers

-2, -1 is the byte order sign (BOM - U + FEFF), which indicates that the following text is encoded in UTF-16 format.

You probably get this because although there is only one encoding, UTF8 and UTF32, there are two encodings UTF16 and UTF16BE UTF16, where 2 bytes in a 16-bit value are stored in Big-Endian or Little Endian format.

Since the return values ​​are 0xFE xFF, this indicates that the encoding is UTF16BE

+5
source

, Java. , -2, -1 0xfe 0xff... U + FEFF Unicode (BOM)... UTF-16.

, UTF-16BE UTF-16LE . ( , "utf8" .. , , -, , .)

+8

-2, -1 UTF-16 "" (BOM). . Java byte -128 +127.

+2
source

A byte in java is a signed type, so it is normal for it to have negative values.

+2
source

All Articles