I wrote a small, naive regular expression that was supposed to find text inside parentheses:
re.search(r'\((.|\s)*\)', name)
I know that this is not the best way to do this for several reasons, but it works fine. I'm just looking for an explanation of why for some lines this expression starts exponentially longer and then never ends. Last night, after running this code for several months, one of our servers suddenly got stuck in line with a line similar to the following:
x (y) z
I experimented with it and decided that the time taken to double for each space between "y" and "z":
In [62]: %timeit re.search(r'\((.|\s)*\)', 'x (y)' + (22 * ' ') + 'z') 1 loops, best of 3: 1.23 s per loop In [63]: %timeit re.search(r'\((.|\s)*\)', 'x (y)' + (23 * ' ') + 'z') 1 loops, best of 3: 2.46 s per loop In [64]: %timeit re.search(r'\((.|\s)*\)', 'x (y)' + (24 * ' ') + 'z') 1 loops, best of 3: 4.91 s per loop
But also that characters other than space do not have the same effect:
In [65]: %timeit re.search(r'\((.|\s)*\)', 'x (y)' + (24 * 'a') + 'z') 100000 loops, best of 3: 5.23 us per loop
Note: I am not looking for a better regex or other solution to this problem. We no longer use it.
python regex
fletom
source share