Regular expression problem: a sequence of matches only n times in a random place

I have a regex question, take for example:

  • ... AAA BZBZB CCCDDD ...
  • ... BZBZB DDD BZBZB CCC ...

I am looking for a regular expression that matches BZBZB only n times.
in line. So, if I wanted to combine the sequence only once, I should only get the first line as output.

The line occurs at random places in the text. And the regex should be compatible with grep or egrep ...

Thanks in advance.

+5
source share
3 answers

grep '\(.*BZBZB\)\{5\}'will do 5 times, but it will match everything that appears 5 times or more, because grep checks to see if any substring of the string matches. Since grep has no way of negatively matching strings in its regular expressions (characters only), this cannot be done with a single command if, for example, you did not know that the characters used in the string to be matched were not used in the other place.

However, you can do this in two grep commands:

cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'

will return rows in which BZBZB appears exactly 5 times. (In principle, he does a positive test 5 or more times, and then a negative test six or more times.)

+8
source

On the grep man page:

   -m NUM, --max-count=NUM
    Stop  reading  a file after NUM matching lines.  If the input is
    standard input from a regular file, and NUM matching  lines  are
    output,  grep  ensures  that the standard input is positioned to
    just after the last matching line before exiting, regardless  of
    the  presence of trailing context lines.  This enables a calling
    process to resume a search.  When grep stops after NUM  matching
    lines,  it  outputs  any trailing context lines.  When the -c or
    --count option is also  used,  grep  does  not  output  a  count
    greater  than NUM.  When the -v or --invert-match option is also
    used, grep stops after outputting NUM non-matching lines.

, grep:

grep -e "BZ" -o
grep -e "BZ" -m n

"BZ" , . . , , n .

>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ

, , .

+1

Its ugly, but if grep can look to the future, this should work:

/^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/

Edit - Above {5} is a variable n times in OP. GNU grep seems to be making Perl like statements using the -P option.

Perl example

use strict;  
use warnings;  

my @strary = (  
  'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done',  
  'BZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB BZBZB  BZBZB',  
  'BZBZBBZBZBBZBZBBZBZBBZBZB 1',  
  'BZBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB 2',  
);  

my @result = grep /^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/,  @strary;  

for (@result) {  
   print "Found: '$_'\n";  
}  

Output

Found: 'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done'
Found: 'BZBZBBZBZBBZBZBBZBZBBZBZB 1'
0
source

All Articles