Python compares string with multiple regexes

I am pretty experienced with Perl and Ruby, but new to Python, so I hope someone can show me the Pythonic way to accomplish the following task. I want to compare multiple lines with multiple regular expressions and get the appropriate group. In Ruby, it will be something like this:

# Revised to show variance in regex and related action. data, foo, bar = [], nil, nil input_lines.each do |line| if line =~ /Foo(\d+)/ foo = $1.to_i elsif line =~ /Bar=(.*)$/ bar = $1 elsif bar data.push(line.to_f) end end 

My attempts in Python turn out to be pretty ugly, because the corresponding group comes back from the call for matching / searching in the regular expression, and Python has no assignments in conditional expressions or switch statements. What a Python opportunity (or think!) About this problem?

+6
python regex switch-statement
source share
4 answers

Paul McGuire’s solution using the REMatcher middleware class, which matches, saves the match group, and returns a boolean for success / failure, which has the most legible code for this purpose.

0
source share

Something like this, but prettier:

 regexs = [re.compile('...'), ...] for regex in regexes: m = regex.match(s) if m: print m.groups() break else: print 'No match' 
+1
source share

In Python, there are several ways to "bind a name on the fly", for example, my old recipe for "assignment and control"; in this case, I would choose another such method (assuming that Python 2.6 needs minor changes if you are working with an old version of Python), something like:

 import re pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR') for line in lines: mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo), (None, None)) if mo: print '%s: %s' % (m, mo.group(1)) else: print 'NO MATCH: %s' % line 

Of course, you can correct many small details (for example, I chose instead of (.*) (.*) Instead of (.*?) As an equivalent group - they are equivalent, given the next $ immediately), so I chose a shorter form ;-) - you can precompile REs, set arguments differently than the pats_mark tuple (for example, with an index with an RE index), etc.

But the essential ideas, I think, are to make the data-driven structure and associate the correspondence object with the name "on the fly" with the subexpression for mo in [re.match(p, line)] , "loop" over the list of individual elements (genexps only binds names for the cycle, and not for the purpose - some consider the use of this part of the genexps specifications to be “complicated”, but I consider it a perfectly acceptable Python idiom, especially because it was considered at the time listcomps was used, it was assumed that the “ancestors” of genexps in in some sense were developed).

+1
source share

your regex just accepts everything after the third character.

 for line in open("file"): if line.startswith("A:"): print "FOO #{"+line[2:]+"}" elif line.startswith("B:"): print "BAR #{"+line[2:]+"}" else: print "No match" 
-one
source share

All Articles