How to make arbitrary Perl regex completely non-exciting? (Answer: you cannot)

Question

How to make arbitrary Perl regex completely non-exciting? (Answer: you cannot)

How to remove capture from arbitrarily nested subgroups in a Perl regex string? I would like to embed any regular expression in a wrapping expression that captures the sub-regression of both the whole entity and the statically known subsequent groups. Do I need to convert the regular expression string manually to use all non-capturing groups (?:) (and hopefully I won't mess up), or is there a regular expression mechanism or Perl library that provides this?

 # How do I 'flatten' $regex to protect $2 and $3? # Searching 'ABCfooDE' for 'foo' OK, but '((B|(C))fo(o)?(?:D|d)?)', etc., breaks. # IE, how would I turn it effectively into '(?:(?:B|(?:C))fo(?:o)?(?:D|d)?)'? sub check { my($line, $regex) = @_; if ($line =~ /(^.*)($regex)(.*$)/) { print "<", $1, "><", $2, "><", $3, ">\n"; } }

Addendum: I vaguely know $& , $` and $' , and I was advised to avoid them if possible, and I do not have access to ${^PREMATCH} , ${^MATCH} and ${^POSTMATCH} in my Perl 5.8 environment. The above example can be divided into 2/3 pieces using such methods, and more complex real cases can manually iterate, but I think I would like, if possible, a general solution.

Accepted answer: What I want, and surprisingly (at least for me), is not an encapsulating group that makes its contents opaque, so subsequent positional backlinks see the contents as a whole and name links are shortened. gbacon has a potentially useful solution for Perl 5.10+ and FM shows a manual iterative mechanism for any version that can perform the same effect in certain cases, but j_random_hacker calls it that there is no real language mechanism for encapsulating subexpressions.

+8

regex perl

Jeff Aug 24 '10 at 1:02

source share

6 answers

One way to protect the subpatterns you need is to use with the name capture buffers :

In addition, with Perl 5.10.0, you can use named capture buffers and named backlinks. The designation (?<name>...) for the declaration and \k<name> for the link. You can also use apostrophes instead of angle brackets to limit the name; and you can use the backlink syntax \g{name} . You can also refer to a named capture buffer by absolute and relative number. Outside the template, the named capture buffer is accessible through the %+ hash. If different buffers of the same template have the same name, $+{name} and \k<name> belong to the leftmost defined group.

In the context of your question, check becomes

 sub check { use 5.10.0; my($line, $regex) = @_; if ($line =~ /(^.*)($regex)(.*$)/) { print "<", $+{one}, "><", $+{two}, "><", $+{three}, ">\n"; } }

Then call him

 my $pat = qr/(?<one>(?<two>B|(?<three>C))fo(o)?(?:D|d)?)/; check "ABCfooDE", $pat;

exits

  <CfooD><C> <C>

+7

Greg bacon Aug 24 '10 at 1:10

source share

This does not apply to the general case, but your specific example can be processed using the /g option in a scalar context, which allows you to split the problem into two matches, and the second up, where the first stopped:

 sub check { my($line, $regex) = @_; my ($left_side, $regex_match) = ($1, $2) if $line =~ /(^.*)($regex)/g; my $right_side = $1 if $line =~ /(.*$)/g; print "<$left_side> <$regex_match> <$right_side>\n"; # <AB> <CfooD> <E123> } check( 'ABCfooDE123', qr/((B|(C))fo(o)?(?:D|d)?)/ );

+5

Fmc Aug 24 '10 at 1:32

source share

If you need only part of the line before and after the match, you can use @ - and @ + arrays to get offsets to match the line:

 sub check { my ($line, $regex) = @_; if ($line =~ /$regex/) { my $pre = substr $line, 0, $-[0]; my $match = substr $line, $-[0], $+[0] - $-[0]; my $post = substr $line, $+[0]; print "<$pre><$match><$post>\n"; } }

+2

Sean Aug 24 '10 at 5:44

source share

Perl version> 5.22 reportedly has a '/ n' modifier that disables all capture.

+1

Mikhail Lisakov Feb 05 '16 at 10:12

source share

This does not disable the capture, but can accomplish what you want:

 $ perl -wle 'my $_ = "123abc"; /(\d+)/ && print "num: $1"; { /([az]+)/ && print "letter: $1"; } print "num: $1";' num: 123 letter: abc num: 123

You create a new scope and $ 1 outside of it will not be affected.

0

nicomen Aug 24 '10 at 1:06

source share

j_random_hacker · Accepted Answer · 2010-08-24T02:24:19+0000

In general, you cannot.

Even if you can convert everything (...) to (?:...) s, this will not work in the general case, because the template may require backreferences : for example. /(.)X\1/ , which matches any character followed by X , followed by the original character.

Thus, if there is no Perl mechanism to discard the results "after the fact", there is no way to solve your problem for all regular expressions. The best you can do (or you can do if you have Perl 5.10) is to use the gbacon clause and hope to create a unique name for the capture buffer.

How to make arbitrary Perl regex completely non-exciting? (Answer: you cannot)

More articles: