XX: + UseCompressedStrings and Compact Strings are two different things.
UseCompressedStrings means that strings that are ASCII can be converted to byte[] , but this was disabled by default. In jdk-9, this optimization is always enabled, but not through the flag itself, but is embedded.
Until java-9 strings are stored inside char[] in UTF-16 encoding. From java-9 and up, they will be stored as byte[] . Why?
Because in ISO_LATIN_1 each character can be encoded in one byte (8 bits) against what it is used to date (16 bits, 8 of which have never been used). This only works for ISO_LATIN_1 , but this is most of the strings used.
So this is done to use space.
Here is a small example that should make everything clearer:
class StringCharVsByte { public static void main(String[] args) { String first = "first"; String russianFirst = ""; char[] c1 = first.toCharArray(); char[] c2 = russianFirst.toCharArray(); for (char c : c1) { System.out.println(c >>> 8); } for (char c : c2) { System.out.println(c >>> 8); } } }
In the first case, we will only get zeros, which means that the most significant 8 bits are zeros; in the second case there will be a nonzero value, which means that at least one bit of the most significant 8 is present.
This means that if inside we store strings as an array of characters, there are string literals that actually spend half of each char. It turns out there are several applications that actually spend a lot of space because of this.
Do you have a 10 character string of Latin1? You just lost 80 bits, or 10 bytes. To reduce this line compression was performed. And now for these lines there will be no loss of space.
Inside, it also means some very nice things. To distinguish between the line LATIN1 and UTF-16 , there is a coder field:
private final byte coder;
Now based on this, length calculated differently:
public int length() { return value.length >> coder(); }
If our string is only Latin1, the encoder will be zero, so the length of the value (an array of bytes) will be the size of the characters. For non-Latin1, divide into two.