NSPredicate vs NSString: which is better / faster for finding superstrings?

I have a large number of lines that I am looking to see if a given substring exists. There seem to be two reasonable ways to do this.

Option 1: use the NSString rangeOfSubstring method and check if .location :

 NSRange range = [string rangeOfSubstring:substring]; return (range.location != NSNotFound); 

Option 2. Use the NSPredicate CONTAINS syntax:

 NSPredicate *regex = [NSPredicate predicateWithFormat:@"SELF CONTAINS %@", substring]; return ([regex evaluateWithObject:string] == YES) 

Which method is better, or is there a good option 3 that I completely lost? No, I'm not sure what I mean by "better", but maybe I mean faster when many, many string s are repeated.

+4
source share
2 answers

You should check and time any solution using NSPredicate , because in my experience NSPredicate can be very slow.

For simplicity, I would go with a simple loop like for(NSString *string in stringsArray) { } . The body of the loop will contain a simple rangeOfSubstring check. You could improve the performance of this by a few percent using CFStringFind() , but you will only see the benefit if you search across multiple lines. The advantage of using CFStringFind() is that you can avoid the overhead of (very small) Objective-C. Again, this is usually just a win to switch to this when you search for β€œmany” lines (for some always changing the meaning of β€œmany”), and you should always be guided to be sure. Assume an easier way to Objective-C rangeOfString: if you can.

A more complex approach is to use the ^ Blocks function with the NSEnumerationConcurrent option. NSEnumerationConcurrent is just a hint that you want the enumeration to happen at the same time, if possible, and the implementation can ignore this hint if it cannot support parallel enumeration. However, your standard NSArray will most likely implement parallel enumeration. In practice, this leads to the division of all objects in NSArray and their splitting into available CPUs. You need to be careful about how to mutate the state and objects that the ^ block accesses through multiple threads. Here is one possible way to do this:

 // Be sure to #include <libkern/OSAtomic.h> __block volatile OSSpinLock spinLock = OS_SPINLOCK_INIT; __block NSMutableArray *matchesArray = [NSMutableArray array]; [stringsToSearchArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id obj, NSUInteger idx, BOOL *stop) { NSRange matchedRange = [obj rangeOfString:@"this"]; if(matchedRange.location != NSNotFound) { OSSpinLockLock((volatile OSSpinLock * volatile)&spinLock); [matchesArray addObject:obj]; OSSpinLockUnlock((volatile OSSpinLock * volatile)&spinLock); } }]; // At this point, matchesArray will contain all the strings that had a match. 

In this case, a lightweight OSSpinLock is used to ensure that only one thread has access and updates matchesArray at a time. You can also use the same CFStringFind() on top.

In addition, you should be aware that rangeOfString: itself will not match word boundaries. In the above example, I used the word this , which corresponded to line A paleolithist walked in to the bar... , even if it does not contain the word this .

The simplest solution to this small wrinkle is to use the regular expression of the ICU and take advantage of the "extended word break" feature. To do this, you have several options:

  • NSRegularExpression , currently only available in> 4.2 or> 4.3 iOS (I forget that).
  • RegexKit Lite via RegexKitLite-4.0.tar.bz2
  • NSPredicate , via SELF MATCHES '(?w)\b...\b' . The advantage of this is that it does not require anything superfluous (i.e., RegexKit Lite ) and is available for all (?) Mac OS X and iOS versions> 3.0.

The following code shows how to use the extended word unlock features in ICU regular expressions through NSPredicate :

 NSString *searchForString = @"this"; NSString *regexString = [NSString stringWithFormat:@".*(?w:\\b\\Q%@\\E\\b).*", searchForString]; NSPredicate *wordBoundaryRegexPredicate = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regexString]; NSArray *matchesArray = [stringsToSearchArray filteredArrayUsingPredicate:wordBoundaryRegexPredicate]; 

You can make the search case insensitive by replacing (?w: with regexString with (?wi:

The regex, if you're interested, basically says

  • .*(?w:...).* says that "corresponds to something before and after the part (?w:...) " (i.e. we are only interested in the part (?w:...) )
  • (?w:...) says: "Enable the advanced interrupt / detect ICU interrupt function in parentheses."
  • \\b...\\b (in fact, this is only one backslash, any backslash must be a backslash when it is inside the line @"" ), says: "Match at the word boundary".
  • \\Q...\\E says: "Process the text starting immediately after \Q and before \E as literal text (think" Quote "and" End "). In other words, any characters in the" quoted literal text " have no special regex value.

The reason for \Q...\E is that you probably want to match the alphabetic characters in searchForString . Without this, searchForString will be considered as part of the regular expression. As an example, if searchForString was this? then without \Q...\E would it not match the literal string this? but either thi or this , which is probably not what you want. :)

+18
source

Case (n): If you have an array of strings to check for a substring, it is best to use NSPredicate .

 NSPredicate *regex = [NSPredicate predicateWithFormat:@"SELF CONTAINS %@", substring]; NSArray *resultArray = [originalArrayOfStrings filteredArrayUsingPredicate:regex]; 

This will return an array of strings that contain the substring.

If you use NSRange , in this case you need to cycle through all the string objects of the array manually, and obviously it will be slower than NSPredicate .

+2
source

All Articles