NSString uses UTF-16 to store code points inside, so those that are in the range you are looking for ( U+1F300 to U+1F6FF ) will be saved as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar ) is not aware of code points and will provide you with the two bytes that it sees in the pointer that you give it ( 55357 , which you see, is the main substitute for codepoint in UTF-16).
To check the raw code points, you need to convert the string / characters to UTF-32 (which encodes them directly). To do this, you have several options:
Get all the UTF-16 bytes that make up the code, and use this algorithm or CFStringGetLongCharacterForSurrogatePair to convert surrogate pairs to UTF-32.
Use dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert NSString to UTF-32 and interpret the raw bytes as uint32_t .
Use a library like ICU .
source share