Creating a normalized version of a string in sqlite - Polish character ł

Apple provides an example of creating an extra column in a database with a normalized version of the text stored in the database: DerivedProperty

There is a normalizeString function that contains the code:

NSMutableString *result = [NSMutableString stringWithString:unprocessedValue]; CFStringNormalize((CFMutableStringRef)result, kCFStringNormalizationFormD); CFStringFold((CFMutableStringRef)result, kCFCompareCaseInsensitive | kCFCompareDiacriticInsensitive | kCFCompareWidthInsensitive, NULL); 

I tested this method, and there is an example of converting text to a normalized version: ąĄćłŁÓŻźŃĘęĆaacłłozzneec

all diacritical characters were changed correctly, except for the characters: łŁ

Is there any other option for proper normalization?

+3
source share
1 answer

I don’t speak Polish, so my answer may be terribly wrong, but according to http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt the characters "ł" and "Ł" are not combinations " regular "character with a diacritical mark.

The entry for "±" in the Unicode data file

  0105; LATIN SMALL LETTER A WITH OGONEK; Ll; 0; L; 0061 0328 ;;;; N; LATIN SMALL LETTER A OGONEK ;; 0104 ;; 0104

and the sixth field “0061 0328” indicates that “±” can be decomposed into “a” and U + 0328 (COMBINING OGONEK).

But the entries for "ł" and "Ł" are

  0141; LATIN CAPITAL LETTER L WITH STROKE; Lu; 0; L ;;;;; N; LATIN CAPITAL LETTER L SLASH ;;; 0142;
 0142; LATIN SMALL LETTER L WITH STROKE; Ll; 0; L ;;;;; N; LATIN SMALL LETTER L SLASH ;; 0141 ;; 0141

where the sixth field is empty, so these characters are not decomposed.

Therefore, I doubt that there will be some function that normalizes "ł" to "l", and you would have to do this using

 [result replaceOccurrencesOfString:@"ł" withString:@"l" options:0 range:NSMakeRange(0, [result length])]; 
+6
source

Source: https://habr.com/ru/post/1211243/


All Articles