Exclude subexpression from regex in C ++

Suppose I tried to match the following expression using regex.h in C ++ and try to get the subexpressions contained in it:

/^((1|2)|3) (1|2)$/ 

Suppose it was matched with the string "3 1", the subexpressions would be:

 "3 1" "3" "1" 

If instead it was matched with the string "2 1", the subexpressions will be:

 "2 1" "2" "2" "1" 

This means that depending on how the first subexpression is evaluated, the last is in another element of the pmatch array. I understand that this particular example is trivial, as I could remove one of the sets of brackets or capture the last element of the array, but it becomes problematic in more complex expressions.

Suppose all I need is top-level subexpressions, those that are not subexpressions of other subexpressions. Is there any way to get them? Or, alternatively, to find out how many subexpressions are matched in the subexpression so that I can traverse the array no matter how it evaluates?

thanks

+4
source share
2 answers

There are two general approaches to solving this problem:

  • Named capture groups: (?P<name>) , so you can display captured groups explicitly by name.
  • Non-capture groups are usually: (?: blah) , so the group does not become part of the resulting group list, and the rest will remain in the expected order.

It is not clear which regex dialect you are using, so I don’t know if it supports any of these approaches, but this is a comparison table of regular expressions .

Including a group (1 | 2) in a non-capture group will look like this:

 /^((?:1|2)|3) (1|2)$/ 
+3
source

I do not know regex.h , but in many regex libraries you can use brackets without capture by running the group with ?: So this will cause the inner group to become an indexed subexpression:

 /^((?:1|2)|3) (1|2)$/ 
+1
source

Source: https://habr.com/ru/post/1312671/


All Articles