Why some int from 0x0000 to 0xFFFF is not a specific Unicode character

Question

Why some int from 0x0000 to 0xFFFF is not a specific Unicode character

I read from a Java Character document that

Many characters from U + 0000 to U + FFFF are sometimes referred to as basic multilingual aircraft (BMP)

But I tried the following code and found that 2492 int is undefined! Something is wrong? Or do I have a misunderstanding? Thanks!

public static void main( String[] args ) { int count=0; for(int i = 0x0000; i<0xFFFF;i++) { if(!Character.isDefined(i)) { count++; } } System.out.println(count); }

Output:

2492

+6

java unicode

Harry.Chen Jul 6 '15 at 9:31

source share

1 answer

一二三 · Accepted Answer · 2015-07-06T13:00:12+0000

The documentation for isDefined() states that the character is "defined" if it has an entry or is in a range in the UnicodeData file . This identifies the set of code points that were assigned to the characters (and it could be better called isAssigned() ). As you found out, not all code points in the basic multilingual plan are still assigned to symbols ( this map shows where some of the empty spaces are).

However, even if a code point has not been assigned (i.e. isDefined() is false ), it can be assigned in a future version of Unicode and is still a valid code point. Encoding / decoding and working with unassigned code points are perfectly acceptable (although this is a bit strange).

Why some int from 0x0000 to 0xFFFF is not a specific Unicode character

More articles: