Nothing to Repeat from Python Regular Expression

Here is the regex - an egrep attempt and then Python 2.7:

$ echo '/some/path/to/file/abcde.csv' | egrep '* ([a-zA-Z] +). csv '

/ some / path / to / file / abcde.csv

However, the same regex in Python:

re.match(r'*([a-zA-Z]+)\.csv',f ) 

gives:

 Traceback (most recent call last): File "/shared/OpenChai/bin/plothost.py", line 26, in <module> hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles] File "/usr/lib/python2.7/re.py", line 141, in match return _compile(pattern, flags).match(string) File "/usr/lib/python2.7/re.py", line 251, in _compile raise error, v # invalid expression sre_constants.error: nothing to repeat 

Performing a search shows that there is a Python error:

regular expression error - nothing will happen again

This seems to be a python bug (works fine in vim). The source of the problem is the bit (\ s * ...) +.

However, I do not understand: what is the workaround for my regex shown above to make python happy?

Thanks.

+6
source share
2 answers

You do not need * in the template, this causes a problem.

Using

 ([a-zA-Z]+)\.csv 

Or to match the entire line:

 .*([a-zA-Z]+)\.csv 

Watch the demo

The reason is that * has no restrictions and therefore is considered as a quantifier. It applies to the previous subpattern in regular expression. Here it is used at the beginning of the template and, therefore, cannot quantify anything. Thus, nothing is repeated .

If it "works" in VIM, this only happens because the VIM regex engine ignores this subpattern (the same as Java, with unescaped [ and ] inside the character class, for example [([)]] ).

+3
source

This is not a bug for python regex using traditional NFA to match patterns. and the * symbol just works when a token precedes.

'*'

Makes the resulting RE match 0 or more repetitions of the preceding REs, as many repetitions as possible. ab * will match "a", "ab" or "a", followed by any number of "bs".

So you can use .* , Which repeat any character ( . ):

 r'.*([a-zA-Z]+)\.csv' 

Python also provides a fnmatch module that supports Unix shell-style templates.

 >>> import fnmatch >>> s="/some/path/to/file/abcde.csv" >>> fnmatch.fnmatch(s, '*.csv') True 
+3
source

All Articles