.net regex with lookbehind condition and capture group

Pattern: a(?(?<! ) )b (c)

Login: abc

Desription: The condition must match a space if lookbehind is not space.

It matches the correct one, but the capture group $ 1 is empty (instad of c).

Is this a problem with .net regex or am I missing something?

Example: http://regexstorm.net/tester?p=a(%3f(%3f%3C!+)+)b+(c)&i=a+b+c

+6
source share
2 answers

I'm not sure if this behavior is documented or not (if yes, then I did not find it), but using a conditional constructor that includes an explicit statement about zero width as an expression (?(?=expression)yes|no) Expression (?(?=expression)yes|no) overrides the first next capture group (empties her). You can confirm this by running below RegEx:

 a(?(?<! ) )b (c)() 

Four ways to overcome this problem:

  • The adjective expression in parentheses marked by @DmitryEgorov (which also keeps the second capture of the group intact) and is not included in the result - the right way:

     a(?((?<! )) )b (c) 
  • Since this behavior applies only to unnamed capture groups (by default), you can get the expected result using a named capture group:

     a(?(?<! ) )b (?<first>c) 
  • Adding an additional capture group, wherever you are between (c) and conditional:

     a(?(?<! ) )(b) (c) 
  • Avoid this expression if possible. For instance:

     a(?( ) )b (c) 
+4
source

In addition to @revo answer :

There is not only a conditional construction with an explicit statement of zero width, as well as its expression. In fact, almost all conditional constructions where condition expressions are copied regular expressions (grouping, conditional, other special) used without additional brackets.

In such cases, there are four types of (incorrect) behavior:

  • The capture of a group array becomes distorted (as the OP indicates), namely: the capture group is lost immediately after the conditional construction; the remaining groups are shifted to the left, leaving the last capture group undefined.

    In the following examples, the expected capture distribution is

     $1="a", $2="b", $3="c" 

    whereas the actual result

     $1="a", $2="c", $3="" (the latter is empty string) 

    Refers to:

  • Throws an ArgumentException during regular expression parsing. It really makes sense, as it clearly warns us of some potential regular expression error, rather than playing fun tricks with captures, as in the previous case.

    Refers to:

    • (a)(?(?<n>.) )(b) (c) , (a)(?(?'n'.) )(b) (c) - named groups - exception message: "Alternation conditions do not capture and cannot be named"
    • (a)(?(?'-n' .) )(b) (c) , (?<a>a)(?(?<an>.) )(b) (c) - balancing groups - message about exception: "Alternation conditions do not capture and cannot be named"
    • (a)(?(?# comment) )(b) (c) - embedded comment - exception message: "Alternation conditions cannot be comments"
  • OutOfMemoryException during pattern matching. This is obviously a mistake, I suppose.

    Refers to:

    • (a)(?(?i) )(b) (c) - built-in parameters (not to be confused with group parameters)
  • [Surprisingly] works as expected, but this is too artificial an example:

All of these regular expressions can be corrected by including the condition expression in an explicit bracket (i.e., optional if the expression itself already contains brackets). Here are the fixed versions (in order of appearance):

 (a)(?((?=.)) )(b) (c) (a)(?((?!z)) )(b) (c) (a)(?((?<=.)) )(b) (c) (a)(?((?<! )) )(b) (c) (a)(?((?: )) )(b) (c) (a)(?((?i:.)) )(b) (c) (a)(?((?>.)) )(b) (c) (a)(?((?(1).)) )(b) (c) ((?<n>a))(?((?(n).)) )(b)(c) (a)(?((?(?:.).)) )(b) (c) (a)(?((?<n>.)) )(b) (c) (a)(?((?'n'.)) )(b) (c) (a)(?((?'-n' .)) )(b) (c) (?<a>a)(?((?<an>.)) )(b) (c) (a)(?((?# comment)) )(b) (c) (a)(?((?i)) )(b) (c) (a)(?((?(.).)) )(b) (c) 

Sample code to test all of these expressions: https://ideone.com/KHbqMI

+2
source

All Articles