How to highlight duplicate repeating words with a Perl regular expression?

Question

How to highlight duplicate repeating words with a Perl regular expression?

I want a Perl regex that matches duplicate words in a string.

Given the following input:

$str = "Thus joyful Troy Troy maintained the the watch of night..."

I need the following output:

  Thus joyful [Troy Troy] maintained [the the] watch of night ...

+6

regex perl

muruga Mar 24 '10 at 3:51

source share

4 answers

This is similar to one of the Learning Perl exercises. The trick is to catch all duplicate words, so you need “one or more” duplication quantifiers:

  $str = 'This is Goethe the the the their sentence'; $str =~ s/\b((\w+)(?:\s+\2\b)+)/[\1]/g;

The functions I'm going to use are described in perlre when they are applied to a template or perlop when they affect how the substitution operator does its job.

If you like the /x flag to add minor spaces and comments:

  $str =~ s/ \b ( (\w+) (?: \s+ \2 \b )+ ) /[\1]/xg;

I don’t like it \2 , although I hate to consider relative positions. I can use relative backlinks in Perl 5.10. \g{-1} refers to the immediately preceding capture group:

  use 5.010; $str =~ s/ \b ( (\w+) (?: \s+ \g{-1} \b )+ ) /[\1]/xg;

Counting is not that good either, so I can use tagged matches:

  use 5.010; $str =~ s/ \b ( (?<word>\w+) (?: \s+ \k<word> \b )+ ) /[\1]/xg;

I can mark the first capture ( $1 ) and access its value in %+ later:

  use 5.010; $str =~ s/ \b (?<dups> (?<word>\w+) (?: \s+ \k<word> \b )+ ) /[$+{dups}]/xg;

I do not need this first capture because it really just refers to everything that matches. Unfortunately, it seems that ${^MATCH} not installed early enough so that I can use it on the replacement side. I think a mistake. This should work, but does not work:

  $str =~ s/ \b (?<word>\w+) (?: \s+ \k<word> \b )+ /[${^MATCH}]/pgx; # DOESN'T WORK

I test this on blead, but it will take a little time to compile my tiny machine.

+12

brian d foy Mar 24 '10 at 17:02

source share

You can try:

 $str = "Thus joyful Troy Troy maintained the the watch of night..."; $str =~s{\b(\w+)\s+\1\b}{[$1 $1]}g; print "$str"; # prints Thus joyful [Troy Troy] maintained [the the] watch of night...

Used expression: \b(\w+)\s+\1\b

Explanation:

\b : word bondary
\w+ : word
() : remember the specified word
\s+ : spaces
\1 : catchy word

He effectively finds two complete words, separated by spaces, and places them around [ ] .

EDIT:

If you want to keep the number of spaces between words that you can use:

 $str =~s{\b(\w+)(\s+)\1\b}{[$1$2$1]}g;

+2

codaddict Mar 24 '10 at 4:02

source share

Try the following:

 $str =~ s/\b(\S+)\b(\s+\1\b)+/[\1]/g;

0

Krishnachandra Sharma Dec 01 '16 at 8:27

source share

Kip · Accepted Answer · 2010-03-24T04:01:20+0000

It works:

 $str =~ s/\b((\w+)\s+\2)\b/[\1]/g;

How to highlight duplicate repeating words with a Perl regular expression?

More articles: