This is similar to one of the Learning Perl exercises. The trick is to catch all duplicate words, so you need “one or more” duplication quantifiers:
$str = 'This is Goethe the the the their sentence'; $str =~ s/\b((\w+)(?:\s+\2\b)+)/[\1]/g;
The functions I'm going to use are described in perlre when they are applied to a template or perlop when they affect how the substitution operator does its job.
If you like the /x flag to add minor spaces and comments:
$str =~ s/ \b ( (\w+) (?: \s+ \2 \b )+ ) /[\1]/xg;
I don’t like it \2 , although I hate to consider relative positions. I can use relative backlinks in Perl 5.10. \g{-1} refers to the immediately preceding capture group:
use 5.010; $str =~ s/ \b ( (\w+) (?: \s+ \g{-1} \b )+ ) /[\1]/xg;
Counting is not that good either, so I can use tagged matches:
use 5.010; $str =~ s/ \b ( (?<word>\w+) (?: \s+ \k<word> \b )+ ) /[\1]/xg;
I can mark the first capture ( $1 ) and access its value in %+ later:
use 5.010; $str =~ s/ \b (?<dups> (?<word>\w+) (?: \s+ \k<word> \b )+ ) /[$+{dups}]/xg;
I do not need this first capture because it really just refers to everything that matches. Unfortunately, it seems that ${^MATCH} not installed early enough so that I can use it on the replacement side. I think a mistake. This should work, but does not work:
$str =~ s/ \b (?<word>\w+) (?: \s+ \k<word> \b )+ /[${^MATCH}]/pgx;
I test this on blead, but it will take a little time to compile my tiny machine.
brian d foy
source share