One of the tricks you can do to figure out what the regular expression engine does is:
use re 'debug';
eg:.
use strict; use warnings; use re 'debug'; my $str = "a" x 18; $str =~ m{ \(([^()]+|\( [^()]* \))+\)}x;
Then what the regular expression engine does is printed:
Compiling REx " \(([^()]+|\( [^()]* \))+\)" Final program: 1: EXACT <(> (3) 3: CURLYX[0] {1,32767} (40) 5: OPEN1 (7) 7: BRANCH (20) 8: PLUS (37) 9: ANYOF[^()][{above_bitmap_all}] (0) 20: BRANCH (FAIL) 21: EXACT <(> (23) 23: STAR (35) 24: ANYOF[^()][{above_bitmap_all}] (0) 35: EXACT <)> (37) 37: CLOSE1 (39) 39: WHILEM[1/3] (0) 40: NOTHING (41) 41: EXACT <)> (43) 43: END (0) anchored "(" at 0 floating ")" at 2..2147483647 (checking floating) minlen 3 Matching REx " \(([^()]+|\( [^()]* \))+\)" against "aaaaaaaaaaaaaaaaaa" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [2..18] gave -1 Did not find floating substr ")"... Match rejected by optimizer Freeing REx: " \(([^()]+|\( [^()]* \))+\)"
If you add your brackets back, you will get a different result - I get about 2000 steps to process the regular expression. This increases with each additional letter - about 300 steps.
So, I would say yes - there is a catastrophic backtracking, but you may find that processors (and optimizing the regex engine) mean that the time is much shorter.
use strict; use warnings; use re 'debug'; my $str = "((()"."a" x 100_000; $str =~ m{ \(([^()]+|\( [^()]* \))+\)}x;
It works much longer - but, at least, this is due to the fact that printing text on the screen is actually quite "expensive".
My test shows (the number "a")
10 : 1221 lines of output (broadly correlated with regex steps) 11 : 1324 (+103) 12 : 1467 (+143) 13 : 1590 (+129) 14 : 1728 (+138) 15 : 1852 (+124) 20 : 2630 (approx +155/extra) 30 : 4536 (+190/extra) 40 : 6940 (+240/extra) 50 : 9846 (+290/extra) 100 - 31,846 (+440/extra letter)
So, of course, this looks like exponential behavior, but in any case, the processing time is still pretty fast, which I would attribute to a faster processor (and, possibly, better optimization of the regular expression mechanism)
Sobrique
source share