The difference between two regular expressions: [abc] + and ([abc]) +

Question

The difference between two regular expressions: [abc] + and ([abc]) +

In [29]: re.findall("([abc])+","abc") Out[29]: ['c'] In [30]: re.findall("[abc]+","abc") Out[30]: ['abc']

Smyth grouped. How does that matter?

+7

python regex

user3015347 Feb 28 '16 at 2:11

source share

5 answers

In the first example, you have a re-captured group that only captures the last iteration. Here c .

 ([abc])+

Demo version of Debuggex

In the second example, you match one character in the list to one and unlimited time.

 [abc]+

Demo version of Debuggex

+7

styvane Feb 28 '16 at 2:20

source share

This is how I would think about it. ([abc])+ tries to repeat the captured group. When you use the “+” after a capture group, this does not mean that you are going to get two captured groups. What ends up, at least for Python's regular expression and most implementations, is that the “+” forces iteration until the capture group contains only the last match.

If you want to write a repeating expression, you need to reverse the order of "(...)" and "+", for example. instead of ([abc])+ use ([abc]+) .

+2

CS Feb 28 '16 at 2:44

source share

input "abc"

 [abc]

matches one character => "a"

 [abc]+

+ Between one and unlimited time, as many times as possible => "abc"

 ([abc])

Group Capture ([abc]) => "a"

 ([abc])+

+ Repeated capture group will only capture the last iteration => "c"

0

Son vu Feb 28 '16 at 2:17

source share

Grouping just gives different preferences.

([abc])+ => Find one of the choices. May match one or more. He finds that all conditions are met as + means 1 or more. This decomposes the regular expression into two stages.

While ungrouped is considered as a whole.

-3

Josh S. Feb 28 '16 at 2:16

source share

Alan moore · Accepted Answer · 2016-02-28T02:55:03+0000

Two things need to be explained here: the behavior of quantitative groups and the design of the findall() method.

In the first example, [abc] matches a , which is fixed in group # 1. Then it matches b and fixes it in group # 1, overwriting a . Then again with c , and what remains in group # 1 at the end of the match.

But it matches the whole line. If you used search() or finditer() , you could look at MatchObject and see that group(0) contains abc and group(1) contains c . But findall() returns strings, not MatchObjects. If there are no groups, it returns a list of common matches; if there are groups, the list contains all captures, but not a general match.

Thus, both of your regular expressions correspond to the whole line, but the first also captures and discards each character individually (which is useless). This is just the unexpected behavior of findall() , which makes it look like you get different results.

The difference between two regular expressions: [abc] + and ([abc]) +

More articles: