How to build a string based on UTF8?

I think I can use \u**** to build a character based on UTF16, how to build a string using UTF8?

+4
source share
2 answers

Strings in Java code-agnostic (they use UTF-16 internally, but that doesn't matter here). The codes that you enter after \u are Unicde code points, they are not the actual binary representation of characters. Each character has an associated code point. Different encodings determine how you map codes to a given binary representation.

In this case, you create a string using code points, and then convert it to arbitrary encoding using the getBytes() method. For example, the euro sign ( ):

 "\u20AC".getBytes("UTF-8"); //-30, -126, -84 "\u20AC".getBytes("UTF-16"); //-2, -1, 32, -84 "\u20AC".getBytes("UTF-32"); // 0, 0, 32, -84 

It is worth remembering: UTF-16 does not actually use 16 bits all the time!

+8
source

The accepted answer is useful, but it really doesn't say how to build a string from UTF-8 data.

Just in case someone wants to know the answer, here it is:

 byte[] bytes = ...; // UTF-8 bytes. String string = new String(bytes, "UTF-8"); 
+4
source

All Articles