In @metatation's answer, the offset is in bytes, not characters. The text in your database is probably UTF8 encoded Unicode, in which case any single character not represented in ASCII represented by several bytes . Examples of non-ASCII characters include accented characters (Γ , ΓΆ, etc.), smart quotes, characters from non-Latin character sets (Greek, Cyrillic, most Asian character sets, etc.) etc.
If the bytes in the SQLite database are UTF8 encoded Unicode strings, you can work out the true Unicode character offset for the given byte offset, for example:
NSUInteger characterOffsetForByteOffsetInUTF8String(NSUInteger byteOffset, const char *string) { NSUInteger characterOffset = 0; for (NSUInteger i = 0; i < byteOffset; i++) { char c = string[i]; if ((c & 0xc0) != 0x80) { characterOffset++; } } return characterOffset; }
Warning. If you use character offsets for indexing in NSString , remember that NSString uses UTF-16 under the hood, so characters with a Unicode code point above U + FFFF are a pair of 16-bit values. Usually you do not come across this for textual content, but if you are not indifferent to particularly obscure character sets or some of the non-textual characters that Unicode can represent, for example, Emojis, then this algorithm will require improvements for maintenance.
(code snippet from this my project - feel free to use it.)
source share