4 byte unicode character in Java

I am writing unit tests for my custom StringDatatype, and I need to write a 4-byte unicode character. "\ U" - does not work (incorrect escape character error) for example: U + 1F701 (0xf0 0x9f 0x9c 0x81). How can this be written to a string?

+4
source share
1 answer

Unicode code point is not 4 bytes; it is an integer (starting from U + 0000 to U + 10FFFF).

Your 4 bytes (wild guess) is his version of the UTF-8 encoding (edit: I was right ).

You need to do this:

final char[] chars = Character.toChars(0x1F701);
final String s = new String(chars);
final byte[] asBytes = s.getBytes(StandardCharsets.UTF_8);

Java , Unicode BMP ( U + 0000 U + FFFF), - char 16 (, , , , ); ... BMP ( - Java ). Java , BMP.

, a char , , UTF-16 , "" String "\uD83D\uDF01" - .

. CharsetDecoder CharsetEncoder.

. String.codePointCount(), , Java 8, String.codePoints() ( CharSequence).

+11

All Articles