Why does '(int) (char) (byte) -2' produce 65534 in Java?

I came across this question in a technical test to work. Given the following code example:

public class Manager { public static void main (String args[]) { System.out.println((int) (char) (byte) -2); } } 

It outputs the result as 65534.

This behavior is displayed only for negative values; 0 and positive numbers give the same value, that is, the value entered in the SOP. The byte given here is negligible; I tried without it.

So my question is: what exactly is going on here?

+68
java casting
Jul 08 '14 at 15:35
source share
4 answers

There are some prerequisites that we need to agree on before you can understand what is happening here. With the understanding of the following points, the rest is a simple conclusion:

  • All primitive types in the JVM are represented as a sequence of bits. The int type is represented by 32 bits, the char and short types are 16 bits, and the byte type is 8 bits.

  • All JVM numbers are signed, where char is the only unsigned "number". When a number is signed, the most significant bit is used to indicate the sign of that number. For this highest bit, 0 represents a non-negative number (positive or zero), and 1 represents a negative number. In addition, with signed numbers, a negative value is inverted (technically known as two-component notation ) to the order of increment of positive numbers. For example, a positive byte value is represented in bits as follows:

     00 00 00 00 => (byte) 0 00 00 00 01 => (byte) 1 00 00 00 10 => (byte) 2 ... 01 11 11 11 => (byte) Byte.MAX_VALUE 

    while the bit order for negative numbers is inverted:

     11 11 11 11 => (byte) -1 11 11 11 10 => (byte) -2 11 11 11 01 => (byte) -3 ... 10 00 00 00 => (byte) Byte.MIN_VALUE 

    This inverted notation also explains why a negative range can contain an extra number compared to a positive range, where the latter includes a representation of the number 0 . Remember that this is just a matter of interpreting the bitmap. You can mark negative numbers in different ways, but this inverted notation for negative numbers is very convenient because it allows some fairly quick conversions, as we will see in a small example later.

    As already mentioned, this is not of type char . The char type represents a Unicode character with a non-negative "numeric range" from 0 to 65535 . Each of these numbers refers to a 16-bit Unicode value.

  • When converting between int , byte , short , char and boolean JVM types must either add or truncate bits.

    If the target type is represented more bits than the type from which it was converted, then the JVM simply populates the extra slots with the highest bit value of the given value (which represents the signature):

     | short | byte | | | 00 00 00 01 | => (byte) 1 | 00 00 00 00 | 00 00 00 01 | => (short) 1 

    Thanks to the inverted notation, this strategy also works for negative numbers:

     | short | byte | | | 11 11 11 11 | => (byte) -1 | 11 11 11 11 | 11 11 11 11 | => (short) -1 

    Thus, the sign of the value is preserved. Without going into the details of implementing this for the JVM, note that this model allows casting using a cheap switch operation , which is obviously beneficial.

    An exception to this rule is an extension of type char , which, as we have said, is unsigned. The conversion from a char always applied by filling in the extra bits 0 , because we said that there is no sign and therefore there is no need for an inverted notation. Therefore, converting a char to int is done as:

     | int | char | byte | | | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535 

    If the source type has more bits than the target type, the extra bits are simply truncated. As long as the original value matches the target value, this works fine, for example, for the following short to byte conversion:

     | short | byte | | 00 00 00 00 | 00 00 00 01 | => (short) 1 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 11 11 11 11 | => (short) -1 | | 11 11 11 11 | => (byte) -1 

    However, if the value is too large or too small, this no longer works:

     | short | byte | | 00 00 00 01 | 00 00 00 01 | => (short) 257 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 00 00 00 00 | => (short) -32512 | | 00 00 00 00 | => (byte) 0 

    This is why narrowing castings sometimes leads to strange results. You may wonder why the narrowing is done this way. You could argue that it would be more intuitive if the JVM checks the range of numbers and rather puts the incompatible number to the largest representable value of the same sign. However, branching is required, which is an expensive operation. This is especially important because this two-component notation allows you to perform cheap arithmetic operations.

With all this information, we can see what happens with number -2 in your example:

 | int | char | byte | | 11 11 11 11 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | => (int) -2 | | | 11 11 11 10 | => (byte) -2 | | 11 11 11 11 | 11 11 11 10 | => (char) \uFFFE | 00 00 00 00 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | => (int) 65534 

As you can see, casting byte redundant, since casting to char will cut out the same bits.

All of this is also JVMS-specific if you prefer a more formal definition of all of these rules.

One final note: the type bit size does not necessarily represent the number of bits reserved by the JVM to represent this type in its memory. In fact, the JVM does not distinguish between the types boolean , byte , short , char and int . All of them are represented by the same JVM type, where the virtual machine simply emulates these castings. In the method operand stack (i.e., any variable inside the method), all values ​​of named types consume 32 bits. This, however, is not true for arrays and fields of objects that any JVM executor can process at will.

+128
Jul 08 '14 at 17:00
source share

There are two important things here,

  • a char is unsigned and cannot be negative
  • casting a byte to char first enables a hidden cast to int by the Java Language Spec .

So casting -2 to int gives us 1111111111111111111111111111111110. Note that the two padding values ​​were expanded with a familiar one; this only happens with negative values. When we then narrow it down to char, int is truncated to

 1111111111111110 

Finally, casting 1111111111111110 to an int bit extension with zero, not one, because now the value is considered positive (since characters can only be positive). Thus, bit expansion leaves the value unchanged, but unlike the value of a negative value, unchanged by value. And this binary value when printing in decimal format is 65534.

+35
Jul 08 '14 at 15:44
source share

A char has a value from 0 to 65535, so when you put a negative result on char, the result will be the same as subtracting this number from 65536, the result is 65534. If you printed it as char , it would try to display any character the unicode represented by 65534, but then when you added to int , you really get 65534. If you started with a number that was higher than 65536, you would see similar "confusing" results in which a large number (for example, 65538) ends up small ( 2).

+30
Jul 08 '14 at 15:39
source share

I think the easiest way to explain this simply is to break it down into the order of the operations you perform

 Instance | # int | char | # byte | result | Source | 11 11 11 11 | 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | -2 | byte |(11 11 11 11)|(11 11 11 11)|(11 11 11 11)| 11 11 11 10 | -2 | int | 11 11 11 11 | 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | -2 | char |(00 00 00 00)|(00 00 00 00)| 11 11 11 11 | 11 11 11 10 | 65534 | int | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | 65534 | 
  • You just take a 32-bit signed value.
  • Then you convert it to an 8-digit signed value.
  • When you try to convert it to a 16-bit unsigned value, the compiler sneaks into a quick conversion to a 32-bit unsigned value,
  • Then convert it to 16 bits without preserving the sign.
  • When the final conversion to 32 bits occurs, there is no sign, so the value adds zero bits to maintain the value.

So yes, when you look at it this way, the byte throws significant (academically speaking), although the result is negligible (joy to programming, significant action can have little effect). The effect of narrowing and expanding while maintaining the mark. Where the conversion to char is narrowed but not expanded for signing.

(Note: I used the # symbol to indicate a signed bit, and as noted, there is no sign bit for char, as this is an unsigned value).

I used parens to represent what is actually going on inside. Data types are actually trunked in their logical blocks, but if you look at int, their results will correspond to Parens characters.

Signed values ​​always expand with the value of the signed bit. Unsigned always expands with the bit turned off.

* So the trick (or gotchas) for this is that expanding to int from a byte supports an extensible value. Which then narrows at the moment you touch char. Then the signed bit is disabled.

If conversion to int did not occur, the value would be 254. But it is, so it is not.

+6
Jul 09 '14 at 20:44
source share



All Articles