Does Java support String.getBytes ("UTF-8") lexicographic order?

If I have a lexicographic sorted list of Java strings [s1,s2,s3,s4, ...., sn] , then convert each string to an array of bytes using UTF-8 encoding bx = sx.getBytes("UTF-8") , is the list of byte arrays [b1,b2,b3,...bn] also lexicographic?

+4
source share
2 answers

Yes. According to RFC 3239 :

The order of lexicographic sorting of byte values โ€‹โ€‹of UTF-8 strings is the same as if they were ordered by character numbers. Of course, this is of limited interest, since sort order based on character numbers is almost never culturally justified.

As Ian Roberts noted, this refers to โ€œtrue UTF-8 (for example, String.getBytes will give you),โ€ but beware of DataInputStream fake UTF-8 that will sort [U + 000000] after [U + 000001] and [U + 00F000] after [U + 10FFFF].

+5
source

You get a list / array of X objects in the given order.

You create a new list / Y array of such objects using the method.

Y will have the order with which you created it (usually you just saved the X-order). No reordering occurs.

In addition, the lexicographic ordering of byte [] does not make sense.

-2
source

All Articles