How can I find the first occurrence of a pattern in a string from some starting position?

I have a string of arbitrary length and starting at position p0, I need to find the first occurrence of one of the three three-letter patterns.

Suppose a string contains only letters. I need to find the triplet count, starting from position p0 and jumping forward in triplets until the first appearance of either "aaa", or "bbb" or "ccc".

Is this possible using only regex?

+6
string regex search perl
source share
5 answers
Moritz says it can be faster than a regular expression. Even if it is a little slower, it is easier to understand at 5 in the morning. :)
  # 0123456789.123456789.123456789.  
 my $ string = "alsdhfaaasccclaaaagalkfgblkgbklfs";  
 my $ pos = 9;  
 my $ length = 3;  
 my $ regex = qr / ^ (aaa | bbb | ccc) /;

 while ($ pos <length $ string)    
     {  
     print "Checking $ pos \ n";  

     if (substr ($ string, $ pos, $ length) = ~ / $ regex /)
         {
         print "Found $ 1 at $ pos \ n";
         last;
         }

     $ pos + = $ length;
     }
+12
source share
$string=~/^ # from the start of the string (?:.{$p0}) # skip (don't capture) "$p0" occurrences of any character (?:...)*? # skip 3 characters at a time, # as few times as possible (non-greedy) (aaa|bbb|ccc) # capture aaa or bbb or ccc as $1 /x; 

(Assuming p0 is based on 0).

Of course, it is probably more efficient to use substr in a line to scroll forward:

 substr($string, $p0)=~/^(?:...)*?(aaa|bbb|ccc)/; 
+12
source share

You cannot count with regular expressions, but you can do something like this:

 pos $string = $start_from; $string =~ m/\G # anchor to previous pos() ((?:...)*?) # capture everything up to the match (aaa|bbb|ccc) /xs or die "No match" my $result = length($1) / 3; 

But I think it is a little faster to use substr () and unpack () to split into a triple and skip triples in a loop.

(edit: length (), not lenght (); -)

+9
source share

The bulk of this is split / (...). But at the end of this, you will have your position and entry data.

 my @expected_triplets = qw<aaa bbb ccc>; my $data_string = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan' ; my $place = 0; my @triplets = grep { length } split /(...)/, $data_string; my %occurrence_for = map { $_, [] } @expected_triplets; foreach my $i ( 0..@triplets ) { my $triplet = $triplets[$i]; push( @{$occurrence_for{$triplet}}, $i ) if exists $occurrence_for{$triplet}; } 

Or for a simple regex calculation (it uses Experimental (?? {}))

 my ( $count, %count ); my $data_string = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan' ; $data_string =~ m/(aaa|bbb|ccc)(??{ $count++; $count{$^N}++ })/g; 
0
source share

If speed is a matter of serious concern, you can, depending on what the 3 lines are, really bizarrely creating a tree (for example, the Aho-Corasick algorithm or similar).

A map is possible for each possible state, for example. state [0] ['a'] = 0 if the lines do not start with 'a'.

0
source share

All Articles