Java. Why does this work differently with English and Slavic characters?

I found a rather strange thing for me while working with Java. This may be a common thing, but I don’t understand why it works this way.

I have a code like this:

Character x = 'B'; Object o = x; System.out.println(o == 'B'); 

It works fine, and the output is true. Then I change English B to Slavic B (B):

 Character x = ''; Object o = x; System.out.println(o == ''); 

Now the output is false. How so? By the way, the output is still “true” if I compare the variable x with “B” directly, but when I do this through an object, it works differently.

Can anyone explain this behavior?

+7
java object compare character
source share
2 answers

Without boxing - using only char - everything will be fine. Similarly, if you use equals instead of == , everything will be fine. The problem is that you are comparing links for nested values ​​with == , which just checks the reference identifier. You see the difference due to how auto-boxing works. You can see the same with Integer :

 Object x = 0; Object y = 0; System.out.println(x == y); // Guaranteed to be true Object x = 10000; Object y = 10000; System.out.println(x == y); // *May* be true 

Basically, “small” values ​​have cached representations in the box, while “large” values ​​may not exist.

From JLS 5.1.7 :

If the value of p squared is an integer literal of type int between -128 and 127 inclusive (§3.10.1) or a logical literal true or false (§3.10.3) or a character literal between '\ u0000' and '\ u007f' inclusive (§3.10.4), then let a and b be the results of any two box transformations p. It always happens that a == b.

Ideally, boxing a primitive value always gives an identical reference. In practice, this may not be possible using existing implementation methods. The rule above is a pragmatic compromise requiring certain common values ​​to always be placed in indistinguishable objects. An implementation can cache these, lazily or impatiently. For other values, the rule prohibits any assumptions about the identity of nested values ​​in the programmer's part. This allows (but does not require) the sharing of some or all of these links. Note that integer literals of type long allowed, but not required, for sharing.

This ensures that in most common cases, behavior will be desirable without imposing an excessive performance penalty, especially on small devices. Less memory-constrained implementations can, for example, cache all char and short values, as well as int and long values ​​in the range from -32K to + 32K.

The "character literal between \u0000 and \ u007f`" part ensures that ASCII characters inserted into cache are cached, but not marked in non-ASCII characters.

+8
source share

when you do

 Character x = 'B' 

it calls Character.valueOf(C)

 2: invokestatic #16 // Method java/lang/Character.valueOf:(C)Ljava/lang/Character; 

which caches

This method will always cache values ​​in the range '\ u0000' up to '\ u007F' inclusive and may cache other values ​​outside this range.

 public static Character valueOf(char c) { if(c <= 127) { // must cache return CharacterCache.cache[(int)c]; } return new Character(c); } 

Similar

  • Integer wrapper class and operator == - where is the behavior indicated?
+2
source share

All Articles