Regular expression - matching word only once per line

Happening:

  • ehello goodbye hello hello goodbye
  • ehello goodbye hello hello goodbye

I want to match line 1 (only has β€œhello” once!) DO NOT want to match line 2 (contains β€œhello” more than once)

I tried to use a negative look ahead and what not ... without real success.

+7
source share
3 answers

A simple option is this (using a multi-line flag, not a period):

^(?!.*\bhello\b.*\bhello\b).*\bhello\b.*$ 

First, make sure you don't have β€œhi” twice, and then check that you have at least once. There are other ways to test the same thing, but I think it's pretty simple.

Of course, you can just match for \bhello\b and count the number of matches ...

+4
source

A common regex would be:

 ^(?:\b(\w+)\b\W*(?!.*?\b\1\b))*\z 

Altho could be cleaner to invert the result of this match:

 \b(\w+)\b(?=.*?\b\1\b) 

This works by matching the word and capturing it, and then checking with lookahead and backreference what it does / doesn't follow anywhere in the line.

+2
source

Since you are only worried about words (for example, tokens separated by spaces), you can simply break the spaces and see how often "hello" appears. Since you did not specify a language, the implementation in Perl is implemented here:

 use strict; use warnings; my $a1="ehello goodbye hellot hello goodbye"; my $a2="ehello goodbye hello hello goodbye"; my @arr1=split(/\s+/,$a1); my @arr2=split(/\s+/,$a2); #grab the number of times that "hello" appears my $num_hello1=scalar(grep{$_ eq "hello"}@arr1); my $num_hello2=scalar(grep{$_ eq "hello"}@arr2); print "$num_hello1, $num_hello2\n"; 

Output signal

 1, 2 
+1
source

All Articles