Check the range of Unicode character values

In Objective-c ...

If I have a character like "Ξ”", how can I get a unicode value and then determine if it is in a certain range of values.

For example, if I want to know if a specific character is in the Unicode range from U+1F300 to U+1F6FF

+4
source share
1 answer

NSString uses UTF-16 to store code points inside, so those that are in the range you are looking for ( U+1F300 to U+1F6FF ) will be saved as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar ) is not aware of code points and will provide you with the two bytes that it sees in the pointer that you give it ( 55357 , which you see, is the main substitute for codepoint in UTF-16).

To check the raw code points, you need to convert the string / characters to UTF-32 (which encodes them directly). To do this, you have several options:

  • Get all the UTF-16 bytes that make up the code, and use this algorithm or CFStringGetLongCharacterForSurrogatePair to convert surrogate pairs to UTF-32.

  • Use dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert NSString to UTF-32 and interpret the raw bytes as uint32_t .

  • Use a library like ICU .

+2
source

All Articles