How to build a string based on UTF8?

Question

How to build a string based on UTF8?

I think I can use \u**** to build a character based on UTF16, how to build a string using UTF8?

+4

java unicode

Adam lee Jun 09 '12 at 10:43

source share

2 answers

The accepted answer is useful, but it really doesn't say how to build a string from UTF-8 data.

Just in case someone wants to know the answer, here it is:

 byte[] bytes = ...; // UTF-8 bytes. String string = new String(bytes, "UTF-8");

+4

Mark byers Jun 09 '12 at 22:46

source share

Tomasz Nurkiewicz · Accepted Answer · 2012-06-09T22:48:22+0000

Strings in Java code-agnostic (they use UTF-16 internally, but that doesn't matter here). The codes that you enter after \u are Unicde code points, they are not the actual binary representation of characters. Each character has an associated code point. Different encodings determine how you map codes to a given binary representation.

In this case, you create a string using code points, and then convert it to arbitrary encoding using the getBytes() method. For example, the euro sign ( € ):

 "\u20AC".getBytes("UTF-8"); //-30, -126, -84 "\u20AC".getBytes("UTF-16"); //-2, -1, 32, -84 "\u20AC".getBytes("UTF-32"); // 0, 0, 32, -84

It is worth remembering: UTF-16 does not actually use 16 bits all the time!

How to build a string based on UTF8?

More articles: