Saving captures with the Perl replacement operator

Can someone explain why the following code ...

#!/opt/local/bin/perl use strict; use warnings; my $string; $string = "\t\t\tEntry"; print "String: >$string<\n"; $string =~ s/^(\t*)//gi; print "\$1: >$1<\n"; print "String: >$string<\n"; print "\n"; $string = "\t\t\tEntry"; $string =~ s/^(\t*)([^\t]+)/$2/gi; print "\$1: >$1<\n"; print "String: >$string<\n"; print "\n"; exit 0; 

... produces the following conclusion ...

 String: > Entry< Use of uninitialized value in concatenation (.) or string at ~/sandbox.pl line 12. $1: >< String: >Entry< $1: > < String: >Entry< 

... or more directly: why doesn't the matching value in the first lookup persist in $ 1?

+6
regex perl capture
source share
2 answers

I tried this on two implementations of Perl 5.12 and did not come across this problem. 5.8.

Since you have g options, perl tries to match the pattern until it works. See Debug Output below.

Therefore, it does not work in Perl 5.8, but it does:

 my $c1; $string =~ s/^(\t*)/$c1=$1;''/ge; 

That way, every time it matches, it saves it $c1 .

This is what use re 'debug' says:

 Compiling REx `^(\t*)' size 9 Got 76 bytes for offset annotations. first at 2 1: BOL(2) 2: OPEN1(4) 4: STAR(7) 5: EXACT <\t>(0) 7: CLOSE1(9) 9: END(0) anchored(BOL) minlen 0 Offsets: [9] 1[1] 2[1] 0[0] 5[1] 3[1] 0[0] 6[1] 0[0] 7[0] Compiling REx `^(\t*)([^\t]+)' size 25 Got 204 bytes for offset annotations. first at 2 1: BOL(2) 2: OPEN1(4) 4: STAR(7) 5: EXACTF <\t>(0) 7: CLOSE1(9) 9: OPEN2(11) 11: PLUS(23) 12: ANYOF[\0-\10\12-\377{unicode_all}](0) 23: CLOSE2(25) 25: END(0) anchored(BOL) minlen 1 Offsets: [25] 1[1] 2[1] 0[0] 5[1] 3[1] 0[0] 6[1] 0[0] 7[1] 0[0] 13[1] 8[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 15[0] String: > Entry< Matching REx `^(\t*)' against ` Entry' Setting an EVAL scope, savestack=5 0 <> < Entry> | 1: BOL 0 <> < Entry> | 2: OPEN1 0 <> < Entry> | 4: STAR EXACT <\t> can match 3 times out of 2147483647... Setting an EVAL scope, savestack=5 3 < > <Entry> | 7: CLOSE1 3 < > <Entry> | 9: END Match successful! match pos=0 Use of uninitialized value in substitution iterator at - line 11. Matching REx `^(\t*)' against `Entry' Setting an EVAL scope, savestack=5 3 < > <Entry> | 1: BOL failed... Match failed Freeing REx: `"^(\\t*)"' Freeing REx: `"^(\\t*)([^\\t]+)"' 

Since you are trying to match spaces at the beginning of a line, you do not need either g or i . So this may be the case when you are trying to do something else.

+7
source share

I think that version 5.10 and higher, it only affects capture buffers, if there is a match.
Interesting in your example: $string =~ s/^(\t*)([^\t]+)/$2/gi;
he did not reset the capture buffers. This is apparently due to the preamble, which evaluates if the match should be checked. In this case ([^\t]+) consumes the entire string in the first match, so string too short occurred and the buffers were never reset.

I cannot check it, but $string =~ s/^(\t*)([^\t])//gi should give the same warning.
if ( s///g ) {} and testing the capture buffers in this case will not necessarily contain anything. So it was in version 5.8. Even in later versions, this is really just a debugging feature.

Change @theracoon to your comment: "I'm sure ([^ \ t] +) doesn't really consume the entire string. The result definitely doesn't reflect that."

This is proof that your regular expression consumed the entire string in the first match, Pass 1.
On the second pass, there was nothing left. This is how the / g modifier works.
He tries again to combine all the regular expression in the position in the line where the last match remained.

 use re 'debug'; $string = "\t\t\tEntry"; $string =~ s/^(\t*)([^\t]+)/$2/gi; print "'$string'\n"; 

Pass 1 ..
Matching REx "^(\t*)([^\t]+)" versus "%t%t%tEntry"
8 < %t%t%tEntry > <>
The match is successful!

Pass 2 ..
Rex matching "^(\t*)([^\t]+)" with "" (No, nothing left to match)
The string is too short [regexec_flags] ...
Match error
'Entry'

+2
source share

All Articles