Check the range of Unicode character values

Question

Check the range of Unicode character values

In Objective-c ...

If I have a character like "Δ", how can I get a unicode value and then determine if it is in a certain range of values.

For example, if I want to know if a specific character is in the Unicode range from U+1F300 to U+1F6FF

+4

string ios objective-c unicode unicode-escapes

Albert ranshaw Feb 11 '13 at 23:20

source share

1 answer

一二三 · Accepted Answer · 2013-02-14T04:52:09+0000

NSString uses UTF-16 to store code points inside, so those that are in the range you are looking for ( U+1F300 to U+1F6FF ) will be saved as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar ) is not aware of code points and will provide you with the two bytes that it sees in the pointer that you give it ( 55357 , which you see, is the main substitute for codepoint in UTF-16).

To check the raw code points, you need to convert the string / characters to UTF-32 (which encodes them directly). To do this, you have several options:

Get all the UTF-16 bytes that make up the code, and use this algorithm or CFStringGetLongCharacterForSurrogatePair to convert surrogate pairs to UTF-32.
Use dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert NSString to UTF-32 and interpret the raw bytes as uint32_t .
Use a library like ICU .

Check the range of Unicode character values

More articles: