In Python regular expressions
re.compile("x"*50000)
gives me OverflowError: regular expression code size limit exceeded
but the next one fails, but it gets to the 100% processor and takes 1 minute on my PC.
>>> re.compile(".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000) <_sre.SRE_Pattern object at 0x03FB0020>
This is normal?
Do you assume that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 shorter than "x"*50000 ?
Tested on Python 2.6, Win32
UPDATE 1 :
It seems that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 can be reduced to .*?
So how about this?
re.compile(".*?x"*50000)
It compiles, and if it can also be reduced to ".*?x" , it should only match the lines "abcx" or "x" , but it does not match.
So, am I missing something?
UPDATE 2 :
My point does not know the maximum line limit of the regex source, I like to know some reasons / concepts of "x"*50000 caught by the overflow handler, but not by ".*?x"*50000 .
That doesn't make sense to me, that's why.
Is this something missing when checking for overflow or is it just fine or is something really crowded?
Any advice / opinions would be appreciated.
source share