Is it possible to get an arbitrary unordered set of named groups in one fell swoop with the Python re module?

This is very convenient for some problems:

>>> re.search('(?P<b>.b.).*(?P<i>.i.)', 'abcdefghijk').groupdict() {'i': 'hij', 'b': 'abc'} 

But what if I don’t know what order to expect ahead of time?

[update]

For example, I have an input variable containing an unknown character order, and it so happens that "b" appears after "i". I want to still be able to refer to groups for ".b". and I.' without having to order my regular expression according to their order in the input var. So, I would like to do something like this, but I don't know if this is possible:

 >>> re.search('(?P<b>.b.)|(?P<i>.i.)', unknown_order_alphabet_str).groupdict() {'i': 'hij', 'b': 'abc'} 

[final update]

I searched around and tormented my brain with a bunch, but could not create any good results. Guessing this functionality does not exist, because probably the only way to repeat it is to scan the entire line once for each group (which, of course, I could do in a loop instead), but I thought I would look, I had to say about this one.

Thank you for your help,
Josh

+6
python regex unordered
source share
5 answers

Use the vertical bar (“or”) in the RE and finditer to get all matching objects of interest: each will have a groupdict with None as the value for groups not participating in this match, and you can “combine” the dicts as you prefer.

For example:

 import re def mergedgroupdict(pattern, thestring): there = re.compile(pattern) result = {} for mo in there.finditer(thestring): d = mo.groupdict() for k in d: if k not in result and d[k] is not None: result[k] = d[k] return result 

a merge strategy is used, which is to select the first actual match for each named group in the template. Now for example

 >>> mergedgroupdict('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijk') {'i': 'hij', 'b': 'abc'} >>> mergedgroupdict('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijk'[::-1]) {'i': 'jih', 'b': 'cba'} 

presumably according to your desire, if I interpret your question correctly.

+1
source share
 >>> [m.groupdict() for m in re.finditer('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijk')] [{'i': None, 'b': 'abc'}, {'i': 'hij', 'b': None}] 

This seems like normal, although if you have a lot of groups checking which one is not None , it might get tired.

This finds all .b. and all .i. in line. If you want to be sure that one is found, you will also need to check it manually.

0
source share

The closest I can get is this:

 >>> [match.groupdict() for match in re.finditer('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijk')] [{'i': None, 'b': 'abc'}, {'i': 'hij', 'b': None}] 

How you combine the dictionaries then depends on whether you expect more than one match. If you need only one match, you can do:

 >>> results = {} >>> for match in re.finditer('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijk'): ... results.update(dict((k,v) for k, v in match.groupdict().iteritems() if v is not None)) ... >>> results {'i': 'hij', 'b': 'abc'} 

Or for a few matches:

 >>> results = defaultdict(lambda: []) >>> for match in re.finditer('(?P<b>.b.)|(?P<i>.i.)', 'abcdefghijkabcdefghijk'): ... for k, v in match.groupdict().iteritems(): ... if v is not None: ... results[k].append(v) ... >>> results defaultdict(<function <lambda> at 0x7f53d0992c08>, {'i': ['hij', 'hij'], 'b': ['abc', 'abc']}) 
0
source share

Here is a method that does not require finditer and dictionary merging:

 >>> pat = re.compile(r'(?:.*?(?:(?P<b>.b.)|(?P<i>.i.))){2}') >>> pat.search('abcdefghijk').groupdict() {'i': 'hij', 'b': 'abc'} >>> pat.search('aicdefghbjk').groupdict() {'i': 'aic', 'b': 'hbj'} 

It is assumed that each of the characters b and i displayed exactly once on your line, otherwise:

  • If one of the characters may be missing, you can use {,2} instead of {2} .
  • If one of the characters appears more than once, the search will extract the first two occurrences of any of them (for example, it can find b twice, and not find i at all).
0
source share

Here is the last visit to the game with one hit, which can be read for beginners:

 >>> dict([(name, re.search(pattern, "abcdefghijk").group()) for name, pattern in {"b": ".b.", "i": ".i"}.items()]) {'b': 'abc', 'i': 'hij'} 
0
source share

All Articles