Java. Why does this work differently with English and Slavic characters?

Question

Java. Why does this work differently with English and Slavic characters?

I found a rather strange thing for me while working with Java. This may be a common thing, but I don’t understand why it works this way.

I have a code like this:

Character x = 'B'; Object o = x; System.out.println(o == 'B');

It works fine, and the output is true. Then I change English B to Slavic B (B):

 Character x = ''; Object o = x; System.out.println(o == '');

Now the output is false. How so? By the way, the output is still “true” if I compare the variable x with “B” directly, but when I do this through an object, it works differently.

Can anyone explain this behavior?

+7

java object compare character

user2452103 10 Sep '14 at 16:46

source share

2 answers

when you do

 Character x = 'B'

it calls Character.valueOf(C)

 2: invokestatic #16 // Method java/lang/Character.valueOf:(C)Ljava/lang/Character;

which caches

This method will always cache values in the range '\ u0000' up to '\ u007F' inclusive and may cache other values outside this range.

 public static Character valueOf(char c) { if(c <= 127) { // must cache return CharacterCache.cache[(int)c]; } return new Character(c); }

Similar

Integer wrapper class and operator == - where is the behavior indicated?

+2

Jigar joshi 10 Sep '14 at 16:50

source share

Jon skeet · Accepted Answer · 2014-09-10T16:49:33+0000

Without boxing - using only char - everything will be fine. Similarly, if you use equals instead of == , everything will be fine. The problem is that you are comparing links for nested values with == , which just checks the reference identifier. You see the difference due to how auto-boxing works. You can see the same with Integer :

 Object x = 0; Object y = 0; System.out.println(x == y); // Guaranteed to be true Object x = 10000; Object y = 10000; System.out.println(x == y); // *May* be true

Basically, “small” values have cached representations in the box, while “large” values may not exist.

From JLS 5.1.7 :

If the value of p squared is an integer literal of type int between -128 and 127 inclusive (§3.10.1) or a logical literal true or false (§3.10.3) or a character literal between '\ u0000' and '\ u007f' inclusive (§3.10.4), then let a and b be the results of any two box transformations p. It always happens that a == b.
Ideally, boxing a primitive value always gives an identical reference. In practice, this may not be possible using existing implementation methods. The rule above is a pragmatic compromise requiring certain common values to always be placed in indistinguishable objects. An implementation can cache these, lazily or impatiently. For other values, the rule prohibits any assumptions about the identity of nested values in the programmer's part. This allows (but does not require) the sharing of some or all of these links. Note that integer literals of type long allowed, but not required, for sharing.
This ensures that in most common cases, behavior will be desirable without imposing an excessive performance penalty, especially on small devices. Less memory-constrained implementations can, for example, cache all char and short values, as well as int and long values in the range from -32K to + 32K.

The "character literal between \u0000 and \ u007f`" part ensures that ASCII characters inserted into cache are cached, but not marked in non-ASCII characters.

Java. Why does this work differently with English and Slavic characters?

More articles: