The difference between two regular expressions: [abc] + and ([abc]) +

In [29]: re.findall("([abc])+","abc") Out[29]: ['c'] In [30]: re.findall("[abc]+","abc") Out[30]: ['abc'] 

Smyth grouped. How does that matter?

+7
python regex
source share
5 answers

Two things need to be explained here: the behavior of quantitative groups and the design of the findall() method.

In the first example, [abc] matches a , which is fixed in group # 1. Then it matches b and fixes it in group # 1, overwriting a . Then again with c , and what remains in group # 1 at the end of the match.

But it matches the whole line. If you used search() or finditer() , you could look at MatchObject and see that group(0) contains abc and group(1) contains c . But findall() returns strings, not MatchObjects. If there are no groups, it returns a list of common matches; if there are groups, the list contains all captures, but not a general match.

Thus, both of your regular expressions correspond to the whole line, but the first also captures and discards each character individually (which is useless). This is just the unexpected behavior of findall() , which makes it look like you get different results.

+8
source share

In the first example, you have a re-captured group that only captures the last iteration. Here c .

 ([abc])+ 

Regular expression visualization

Demo version of Debuggex

In the second example, you match one character in the list to one and unlimited time.

 [abc]+ 

Regular expression visualization

Demo version of Debuggex

+7
source share

This is how I would think about it. ([abc])+ tries to repeat the captured group. When you use the “+” after a capture group, this does not mean that you are going to get two captured groups. What ends up, at least for Python's regular expression and most implementations, is that the “+” forces iteration until the capture group contains only the last match.

If you want to write a repeating expression, you need to reverse the order of "(...)" and "+", for example. instead of ([abc])+ use ([abc]+) .

+2
source share

input "abc"

 [abc] 

matches one character => "a"

 [abc]+ 

+ Between one and unlimited time, as many times as possible => "abc"

 ([abc]) 

Group Capture ([abc]) => "a"

 ([abc])+ 

+ Repeated capture group will only capture the last iteration => "c"

0
source share

Grouping just gives different preferences.

([abc])+ => Find one of the choices. May match one or more. He finds that all conditions are met as + means 1 or more. This decomposes the regular expression into two stages.

While ungrouped is considered as a whole.

-3
source share

All Articles