All JVM numbers are signed, where char is the only unsigned "number". When a number is signed, the most significant bit is used to indicate the sign of that number. For this highest bit, 0 represents a non-negative number (positive or zero), and 1 represents a negative number. In addition, with signed numbers, a negative value is inverted (technically known as two-component notation ) to the order of increment of positive numbers. For example, a positive byte value is represented in bits as follows:
00 00 00 00 => (byte) 0 00 00 00 01 => (byte) 1 00 00 00 10 => (byte) 2 ... 01 11 11 11 => (byte) Byte.MAX_VALUE
while the bit order for negative numbers is inverted:
11 11 11 11 => (byte) -1 11 11 11 10 => (byte) -2 11 11 11 01 => (byte) -3 ... 10 00 00 00 => (byte) Byte.MIN_VALUE
This inverted notation also explains why a negative range can contain an extra number compared to a positive range, where the latter includes a representation of the number 0 . Remember that this is just a matter of interpreting the bitmap. You can mark negative numbers in different ways, but this inverted notation for negative numbers is very convenient because it allows some fairly quick conversions, as we will see in a small example later.
As already mentioned, this is not of type char . The char type represents a Unicode character with a non-negative "numeric range" from 0 to 65535 . Each of these numbers refers to a 16-bit Unicode value.
When converting between int , byte , short , char and boolean JVM types must either add or truncate bits.
If the target type is represented more bits than the type from which it was converted, then the JVM simply populates the extra slots with the highest bit value of the given value (which represents the signature):
| short | byte | | | 00 00 00 01 | => (byte) 1 | 00 00 00 00 | 00 00 00 01 | => (short) 1
Thanks to the inverted notation, this strategy also works for negative numbers:
| short | byte | | | 11 11 11 11 | => (byte) -1 | 11 11 11 11 | 11 11 11 11 | => (short) -1
Thus, the sign of the value is preserved. Without going into the details of implementing this for the JVM, note that this model allows casting using a cheap switch operation , which is obviously beneficial.
An exception to this rule is an extension of type char , which, as we have said, is unsigned. The conversion from a char always applied by filling in the extra bits 0 , because we said that there is no sign and therefore there is no need for an inverted notation. Therefore, converting a char to int is done as:
| int | char | byte | | | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535
If the source type has more bits than the target type, the extra bits are simply truncated. As long as the original value matches the target value, this works fine, for example, for the following short to byte conversion:
| short | byte | | 00 00 00 00 | 00 00 00 01 | => (short) 1 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 11 11 11 11 | => (short) -1 | | 11 11 11 11 | => (byte) -1
However, if the value is too large or too small, this no longer works:
| short | byte | | 00 00 00 01 | 00 00 00 01 | => (short) 257 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 00 00 00 00 | => (short) -32512 | | 00 00 00 00 | => (byte) 0
This is why narrowing castings sometimes leads to strange results. You may wonder why the narrowing is done this way. You could argue that it would be more intuitive if the JVM checks the range of numbers and rather puts the incompatible number to the largest representable value of the same sign. However, branching is required, which is an expensive operation. This is especially important because this two-component notation allows you to perform cheap arithmetic operations.