Why some int from 0x0000 to 0xFFFF is not a specific Unicode character

I read from a Java Character document that

Many characters from U + 0000 to U + FFFF are sometimes referred to as basic multilingual aircraft (BMP)

But I tried the following code and found that 2492 int is undefined! Something is wrong? Or do I have a misunderstanding? Thanks!

public static void main( String[] args ) { int count=0; for(int i = 0x0000; i<0xFFFF;i++) { if(!Character.isDefined(i)) { count++; } } System.out.println(count); } 

Output:

2492

+6
source share
1 answer

The documentation for isDefined() states that the character is "defined" if it has an entry or is in a range in the UnicodeData file . This identifies the set of code points that were assigned to the characters (and it could be better called isAssigned() ). As you found out, not all code points in the basic multilingual plan are still assigned to symbols ( this map shows where some of the empty spaces are).

However, even if a code point has not been assigned (i.e. isDefined() is false ), it can be assigned in a future version of Unicode and is still a valid code point. Encoding / decoding and working with unassigned code points are perfectly acceptable (although this is a bit strange).

+4
source

All Articles