Turkish Line Search

When searching for Çınaraltı Café text for Ci text using code

 NSStringCompareOptions options = NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch; NSLocale *locale = [NSLocale localeWithLocaleIdentifier:@"tr"]; NSRange range = [haystack rangeOfString:needle options:options range:NSMakeRange(o, haystack.length) locale:locale]; 

I get range.location equal to NSNotFound .

This is not related to diacritics on the initial Ç, because I get the same result that alti searches for, where the only odd character is ı. I also get the correct search for a Cafe that contains diacritical (é).

Apple docs mentions this situation as notes for the locale parameter, and I think I follow them. Although I think that I am not because it does not work.

How can I find a search for “i” to match “i” and “ı”?

+7
ios objective-c nsstring localization turkish
source share
4 answers

I did this and it seemed to work well for me .. hope this helps!

 NSString *cleanedHaystack = [haystack stringByReplacingOccurrencesOfString:@"ı" withString:@"i"]; cleanedHaystack = [cleanedHaystack stringByReplacingOccurrencesOfString:@"İ" withString:@"I"]; NSString *cleanedNeedle = [needle stringByReplacingOccurrencesOfString:@"ı" withString:@"i"]; cleanedNeedle = [cleanedNeedle stringByReplacingOccurrencesOfString:@"İ" withString:@"I"]; NSUInteger options = (NSDiacriticInsensitiveSearch | NSCaseInsensitiveSearch | NSWidthInsensitiveSearch); NSRange range = [cleanedHaystack rangeOfString:cleanedNeedle options:options]; 
+1
source share

I don't know if this helps in return, but perhaps explains why this is happening.

I must point out that I am not an expert in this matter, but I studied this for my own purposes and did some research.

Looking at the Unicode collation diagram for Latin , the equivalent ASCII characters "i" (\u0069) do not include "ı" (\u0131) , while all the other letters in the line of your example, as you expect, i.e.:

  • "c" (\u0063) includes "Ç" (\u00c7)
  • "e" (\u0065) includes "é" (\u00e9)

The symbol ı indicated separately as the main difference up to i . This may not make sense to the Turkish speaker (I'm not alone), but this is what Unicode has to say about it, and it matches the logic of the problem being described.

In Chrome, you can see this in action when searching on a page. A search on an ASCII i page selects all characters in its block and does not match ı . Search ı does the opposite.

Unlike the MySQL mapping table, utf8_general_ci maps the uppercase ASCII i to ı as you want.

So, not knowing anything about iOS, I assume that it uses the Unicode standard and normalizes all characters for the Latin table.

As for how you map Çınaraltı to Ci - if you cannot override the sort table, then maybe you can just replace i in your search strings with a regular expression, so you do a search on Ç[iı] instead.

+2
source share

I wrote a simple extension in Swift 3 for Turkish string search.

 let turkishSentence = "Türkçe ya da Türk dili, batıda Balkanlar'dan başlayıp doğuda Hazar Denizi sahasına kadar konuşulan Altay dillerinden biridir." let turkishWannabe = "basLayip" let shouldBeTrue = turkishSentence.contains(turkishString: turkishWannabe, caseSensitive: false) let shouldBeFalse = turkishSentence.contains(turkishString: turkishWannabe, caseSensitive: true) 

You can check this from https://github.com/alpkeser/swift_turkish_string_search/blob/master/TurkishTextSearch.playground/Contents.swift

+2
source share

As Tim mentions, we can use a regular expression to match text containing i or ı . I also did not want to add a new field or change the source data, since the search distorts a huge number of lines. So I decided to use regular expressions and NSPredicate .

Create an NSString category and copy this method. It returns the base matching pattern of or . You can use it with any method that accepts a regular expression pattern.

 - (NSString *)zst_regexForTurkishLettersWithCaseSensitive:(BOOL)caseSensitive { NSMutableString *filterWordRegex = [NSMutableString string]; for (NSUInteger i = 0; i < self.length; i++) { NSString *letter = [self substringWithRange:NSMakeRange(i, 1)]; if (caseSensitive) { if ([letter isEqualToString:@"ı"] || [letter isEqualToString:@"i"]) { letter = @"[ıi]"; } else if ([letter isEqualToString:@"I"] || [letter isEqualToString:@"İ"]) { letter = @"[Iİ]"; } } else { if ([letter isEqualToString:@"ı"] || [letter isEqualToString:@"i"] || [letter isEqualToString:@"I"] || [letter isEqualToString:@"İ"]) { letter = @"[ıiIİ]"; } } [filterWordRegex appendString:letter]; } return filterWordRegex; } 

So, if the search word is Şırnak , it creates Ş[ıi]rnak for case sensitivity and Ş[ıiIİ]rnak for case insensitive search.

And here are the possible use cases.

 NSString *testString = @"Şırnak"; // First create your search regular expression. NSString *searchWord = @"şır"; NSString *searchPattern = [searchWord zst_regexForTurkishLettersWithCaseSensitive:NO]; // Then create your matching pattern. NSString *pattern = searchPattern; // Direct match // NSString *pattern = [NSString stringWithFormat:@".*%@.*", searchPattern]; // Contains // NSString *pattern = [NSString stringWithFormat:@"\\b%@.*", searchPattern]; // Begins with // NSPredicate // c for case insensitive, d for diacritic insensitive NSPredicate *predicate = [NSPredicate predicateWithFormat:@"self matches[cd] %@", pattern]; if ([predicate evaluateWithObject:testString]) { // Matches } // If you want to filter an array of objects NSArray *matchedCities = [allAirports filteredArrayUsingPredicate: [NSPredicate predicateWithFormat:@"city matches[cd] %@", pattern]]; 

You can also use NSRegularExpression , but I think that using case and diacritical insensitive searches using NSPredicate much easier.

+1
source share

All Articles