Random and diacritical insensitive regular expression matching with metacharacter in Swift

I'm trying to combine rude words in user inputs, for example, "I hate you!". or “i.håté.Yoù” will match “hate you” in an array of words processed by JSON.

Therefore, I need this to be random and diacritical insensitive and treat spaces in rude words as any non-letter character: regex metacharacter \P{L} should work for this, or at least \W

Now I know that [cd] works with NSPredicate , for example:

  func matches(text: String) -> [String]? { if let rudeWords = JSON?["words"] as? [String]{ return rudeWords.filter { let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch) return NSPredicate(format: "SELF MATCHES[cd] %@", pattern).evaluateWithObject(text) } } else { log.debug("error fetching rude words") return nil } } 

This does not work with metacharacters, I think they are not parsed using NSPredicate , so I tried using NSRegularExpression as follows:

 func matches(text: String) -> [String]? { if let rudeWords = JSON?["words"] as? [String]{ return rudeWords.filter { do { let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch) let regex = try NSRegularExpression(pattern: pattern, options: .CaseInsensitive) return regex.matchesInString(text, options: [], range: NSMakeRange(0, text.characters.count)).count > 0 } catch _ { log.debug("error parsing rude word regex") return false } } } else { log.debug("error fetching rude words") return nil } } 

Everything seems to work fine, but I don’t know how to make regex diacritic insensitive, so I tried this (and other solutions like transcoding)

 let text = text.stringByFoldingWithOptions(.DiacriticInsensitiveSearch, locale: NSLocale.currentLocale()) 

However, this does not work for me, since I check the user input every time a character is typed, so all the decisions that I tried to remove with accents made the application very slow.

Does anyone know if there are any other solutions or am I using this incorrectly?

thanks

EDIT

I really was mistaken that the slow application was slowly trying to match with \P{L} , I tried the second soluton with \W and with the accent line, now it works fine even if it matches the smaller lines that I originally wanted.

References

This may help some regular expression and predicate people:

+6
source share
1 answer

Perhaps it is worth going in a different direction. Instead of smoothing the input, what if you changed the regex?

Instead of matching with hate.you can match, for example, [h][åæaàâä][t][ëèêeé].[y][o0][ùu] (in any case, this is not a complete list). If you cannot change the entries in the database, you can simply replace each e with [ëèêeé] after receiving it.

This will give you more control over which characters will match. If you look, I have 0 as the character corresponding to o . No amount of Unicode enforcement will allow you to do this.

+1
source

All Articles