How to speed up iPhone regular expressions using NSRegularExpression?

My iphone application uses regular expressions (with NSRegularExpression) to perform calculations on a very large number of lines (in the 1000s). This, of course, takes a lot of time. What are some strategies to speed up regular expressions? I studied the use of blocks, but I don’t think it would be beneficial - they seem to mainly represent lambda functionality (i.e., equivalent lisp) and are used on Macs with multiple cores. Obviously, the current iPhone does not have multiple cores.

Here is my code:

NSString *replaceRegexPattern = @"([\\(|\\[].*?[\\)|\\]])|(^to )"; NSRegularExpression *replaceRegex = [[NSRegularExpression regularExpressionWithPattern:replaceRegexPattern options:NSRegularExpressionCaseInsensitive error:nil] retain]; NSArray *myArray = <some data>; NSString *myString, *compareValue; for (i = 0; i < [myArray count]; i++) { myString = [myArray objectAtIndex:i]; compareValue = [replaceRegex stringByReplacingMatchesInString:myString options:0 range:NSMakeRange(0, [myString length]) withTemplate:@""]; // do things with compareValue } 

To answer the question below, my goal in this code is to remove any text in my line that is either enclosed in parentheses or begins with a “before”. Here are some examples:

  • Hello (Goodbye) → Hello
  • Hello (Goodbye [n]) → Hello
  • Say → Say
  • Say (pf) → Say
0
optimization regex ios iphone
source share
3 answers

Since I don’t know what exactly you are trying to do, it’s hard to give reasonable advice, but it looks like your regular expression can be slightly improved.

Are you really trying to match strings like (foo) , [bar] and |baz| ? You do not need an alternator | inside character classes, so if you don't want to match the third example here, release | s.

Then, since you expect strings like (foo [bar] baz) , you need to separate the two kinds of parentheses, and you can also speed up your regex a bit:

 @"^to |\\([^)]*\\)|\\[[^\\]]*\\]" 

First, to checked at the beginning of the line, then a search for opening or brackets is performed, except for closing parsers / brackets and closing groove / brackets. This requires less return, so it may be a little faster.

You will not be able to handle nested parentheses / brackets of the same type ( (foo (bar) baz) ) with one regular expression because it is not regular - unless you run the replace regular expression operation several times, once for each nesting level. Thus, the above example will be deleted if you run the regular expression twice.

0
source share

Are you sure the right expressions are the right tool for this?

If all you are trying to do is remove the text in parentheses, a simple char -by-char loop through the line can do this very easily and even correctly handle the nested parameters.

In pseudo code:

  nesting_level = 0; while more_chars { c = next_char; if c == '(' or c == '[') ++nesting_level; else if c == ')' or c == ']' --nesting_level; // check for nesting_level < 0 here? else if nesting_level == 0 result += c; } 

Obviously do your own tests, but you might get better performance by avoiding regexes.

(and if you care about detecting poorly formed things like "(hello]", you can add a simple recursive descent to it)

+1
source share

The best way to speed up this regular expression is to use possessive quantifiers:

 NSString *replaceRegexPattern = @"^to\\s++|\\[[^\\[\\]]*+\\]|\\([^()]*+\\)"; 

In cases where a match is not possible because the opening bracket does not match the correct closing bracket, *+ prevents backtracking, which, as we know, would be pointless. But successful matching attempts are also more effective because the regex engine does not need to store state information that makes rollback possible.

As Tim noted, this will not match nested instances of the same type of brackets as ((foo)) or [[bar]] . It will match any number of square brackets inside matching parentheses or vice versa. It does not require that these inner brackets be correctly paired, so it will match, for example, (foo[) or [(bar))] . This is also true for your original regular expression.

Including opening brackets in character classes prevents similar matches, such as [[foo] or ((bar) .

0
source share

All Articles