Matching multiple regex patterns with an interlace operator?

I am having a little problem using Python Regex.

Suppose this is an input:

(zyx)bc 

What I'm trying to achieve is to get everything that is between the parentheses as one match, and any char outside as an individual match. The desired result will look like this:

 ['zyx','b','c'] 

The order of matches must be preserved.

I tried to get this using Python 3.3, but cannot figure out the correct Regex. So far I:

 matches = findall(r'\((.*?)\)|\w', '(zyx)bc') 

print(matches) gives the following:

 ['zyx','',''] 

Any ideas what I'm doing wrong?

+7
source share
5 answers

From the re.findall documentation:

If one or more groups are present in the template, return the list of groups; this will be a list of tuples if the template has more than one group.

As long as your regular expression matches the string three times, the group (.*?) empty for the second two matches. If you want to output the other half of the regular expression, you can add a second group:

 >>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc') [('zyx', ''), ('', 'b'), ('', 'c')] 

Alternatively, you can delete all groups to get a simple list of strings again:

 >>> re.findall(r'\(.*?\)|\w', '(zyx)bc') ['(zyx)', 'b', 'c'] 

You need to manually remove the brackets.

+11
source

Let's look at our output using re.DEBUG .

 branch literal 40 subpattern 1 min_repeat 0 65535 any None literal 41 or in category category_word 

Ouch, there is only one subpattern , but re.findall pulls out only subpattern if it exists!

 a = re.findall(r'\((.*?)\)|(.)', '(zyx)bc',re.DEBUG); a [('zyx', ''), ('', 'b'), ('', 'c')] branch literal 40 subpattern 1 min_repeat 0 65535 any None literal 41 or subpattern 2 any None 

It's better.:)

Now we just need to do this in the right format.

 [i[0] if i[0] != '' else i[1] for i in a] ['zyx', 'b', 'c'] 
+2
source

The documents mention the features of groups, so do not put the group around the template in brackets, and you will get everything, but you will need to remove the pairs from the matching data yourself:

 >>> re.findall(r'\(.+?\)|\w', '(zyx)bc') ['(zyx)', 'b', 'c'] 

or use more groups, then process the resulting tuples to get the rows you want:

 >>> [''.join(t) for t in re.findall(r'\((.+?)\)|(\w)', '(zyx)bc')] >>> ['zyx', 'b', 'c'] 
+1
source
 In [108]: strs="(zyx)bc" In [109]: re.findall(r"\(\w+\)|\w",strs) Out[109]: ['(zyx)', 'b', 'c'] In [110]: [x.strip("()") for x in re.findall(r"\(\w+\)|\w",strs)] Out[110]: ['zyx', 'b', 'c'] 
+1
source

Other answers showed how to get the result you need, but with the additional step of manually removing parentheses. If you use lookarounds in your regex, you do not need to manually cut the parentheses:

 >>> import re >>> s = '(zyx)bc' >>> print (re.findall(r'(?<=\()\w+(?=\))|\w', s)) ['zyx', 'b', 'c'] 

Explanations:

 (?<=\() // lookbehind for left parenthesis \w+ // all characters until: (?=\)) // lookahead for right parenthesis | // OR \w // any character 
+1
source

All Articles