How to neatly combine "x" and "[x]" with a regular expression without repeating?

I am writing a Perl regular expression to match the strings x bla and [x] bla . One option is /(?:x|\[x\]) bla/ . This is undesirable because in the real world x more complicated, so I want not to repeat it.

The best solution so far is putting x in a variable and precompiling the regular expression:

 my $x = 'x'; my $re = qr/(?:$x|\[$x\]) bla/o; 

Is there a tidier solution? In this case, readability is more important than performance.

+7
source share
5 answers

Perhaps, but not everything is so clean. You can use the fact that conditional subpatterns support tests such as (?(N)) to verify that the Nth capture subpattern is successfully mapped. That way, you can use an expression like /(\[)?X(?(1)\])/ to match "[X]" or "X".

+9
source

You can also precompile $x . It also makes errors more obvious if $x really ?(+[*{ Or something else that the regex compiler will get completely carried away with.

 my $x = qr/x/; my $re = qr/(?:$x|\[$x\]) bla/o; 
+1
source

In fact, there is no more accurate solution, because it is here that we leave the region of regular languages ​​and begin to require a more complex automaton with some kind of memory. (Backrefs would do this, except that backref expands to literally match the previous part of the line, not "this, but only if that was agreed.")

Sometimes it is possible to use a two-stage process instead, replacing complex X one character, which, as you know, is not present in the source text (control characters may be suitable for this), which simplifies the second stage of the match. Not always possible; depends on what you match.

+1
source

You can write something like (\[)?x(??{ defined $1 ? "]" : "" }) , But you probably shouldn't.

+1
source

I tested the solution /(\[)?X(?(1)\])/ (which collected 7 points), and also matched [X and X] which are incorrect. The original poster /(?:$x|\[$x\]) bla/ does work, requiring either matching brackets or not.

+1
source

All Articles