I'm trying to combine rude words in user inputs, for example, "I hate you!". or “i.håté.Yoù” will match “hate you” in an array of words processed by JSON.
Therefore, I need this to be random and diacritical insensitive and treat spaces in rude words as any non-letter character: regex metacharacter \P{L} should work for this, or at least \W
Now I know that [cd] works with NSPredicate , for example:
func matches(text: String) -> [String]? { if let rudeWords = JSON?["words"] as? [String]{ return rudeWords.filter { let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch) return NSPredicate(format: "SELF MATCHES[cd] %@", pattern).evaluateWithObject(text) } } else { log.debug("error fetching rude words") return nil } }
This does not work with metacharacters, I think they are not parsed using NSPredicate , so I tried using NSRegularExpression as follows:
func matches(text: String) -> [String]? { if let rudeWords = JSON?["words"] as? [String]{ return rudeWords.filter { do { let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch) let regex = try NSRegularExpression(pattern: pattern, options: .CaseInsensitive) return regex.matchesInString(text, options: [], range: NSMakeRange(0, text.characters.count)).count > 0 } catch _ { log.debug("error parsing rude word regex") return false } } } else { log.debug("error fetching rude words") return nil } }
Everything seems to work fine, but I don’t know how to make regex diacritic insensitive, so I tried this (and other solutions like transcoding)
let text = text.stringByFoldingWithOptions(.DiacriticInsensitiveSearch, locale: NSLocale.currentLocale())
However, this does not work for me, since I check the user input every time a character is typed, so all the decisions that I tried to remove with accents made the application very slow.
Does anyone know if there are any other solutions or am I using this incorrectly?
thanks
EDIT
I really was mistaken that the slow application was slowly trying to match with \P{L} , I tried the second soluton with \W and with the accent line, now it works fine even if it matches the smaller lines that I originally wanted.
References
This may help some regular expression and predicate people: