Maximum Regular Expression Match Length

What is the easiest way to determine the maximum regular expression match length?

In particular, I am using the Python re module.

eg. for foo((bar){2,3}|potato) it will be 12.

Obviously, regular expressions using operators such as * and + have theoretically unlimited match lengths; in cases where you return an error or something is in order. Error output for regular expressions with extensions (?...) also fine.

I would also be fine with an approximate upper bound if it is always greater than the actual maximum length, but not too large.

+4
source share
2 answers
I decided, I think. Thanks to unutbu for pointing out sre_parse !
 import sre_parse def get_regex_max_match_len(regex): minlen, maxlen = sre_parse.parse(regex).getwidth() if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex') return maxlen 

Results in:

 >>> get_regex_max_match_len('foo((bar){2,3}|potato)') 12 >>> get_regex_max_match_len('.*') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in get_regex_max_match_len ValueError: unbounded regex 
+3
source

Using pyparsing invRegex module:

 import invRegex data='foo(bar{2,3}|potato)' print(list(invRegex.invert(data))) # ['foobarr', 'foobarrr', 'foopotato'] print(max(map(len,invRegex.invert(data)))) # 9 

Another alternative is to use ipermute from this module .

 import inverse_regex data='foo(bar{2,3}|potato)' print(list(inverse_regex.ipermute(data))) # ['foobarr', 'foobarrr', 'foopotato'] print(max(map(len,inverse_regex.ipermute(data)))) # 9 
+5
source

All Articles