How can I find the first occurrence of a pattern in a string from some starting position?

Question

How can I find the first occurrence of a pattern in a string from some starting position?

I have a string of arbitrary length and starting at position p0, I need to find the first occurrence of one of the three three-letter patterns.

Suppose a string contains only letters. I need to find the triplet count, starting from position p0 and jumping forward in triplets until the first appearance of either "aaa", or "bbb" or "ccc".

Is this possible using only regex?

+6

string regex search perl

slashmais Sep 23 '08 at 9:41

source share

5 answers

$string=~/^ # from the start of the string (?:.{$p0}) # skip (don't capture) "$p0" occurrences of any character (?:...)*? # skip 3 characters at a time, # as few times as possible (non-greedy) (aaa|bbb|ccc) # capture aaa or bbb or ccc as $1 /x;

(Assuming p0 is based on 0).

Of course, it is probably more efficient to use substr in a line to scroll forward:

 substr($string, $p0)=~/^(?:...)*?(aaa|bbb|ccc)/;

+12

Mike G. Sep 23 '08 at 9:44

source share

You cannot count with regular expressions, but you can do something like this:

 pos $string = $start_from; $string =~ m/\G # anchor to previous pos() ((?:...)*?) # capture everything up to the match (aaa|bbb|ccc) /xs or die "No match" my $result = length($1) / 3;

But I think it is a little faster to use substr () and unpack () to split into a triple and skip triples in a loop.

(edit: length (), not lenght (); -)

+9

moritz Sep 23 '08 at 9:56

source share

The bulk of this is split / (...). But at the end of this, you will have your position and entry data.

 my @expected_triplets = qw<aaa bbb ccc>; my $data_string = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan' ; my $place = 0; my @triplets = grep { length } split /(...)/, $data_string; my %occurrence_for = map { $_, [] } @expected_triplets; foreach my $i ( 0..@triplets ) { my $triplet = $triplets[$i]; push( @{$occurrence_for{$triplet}}, $i ) if exists $occurrence_for{$triplet}; }

Or for a simple regex calculation (it uses Experimental (?? {}))

 my ( $count, %count ); my $data_string = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan' ; $data_string =~ m/(aaa|bbb|ccc)(??{ $count++; $count{$^N}++ })/g;

0

Axeman Sep 23 '08 at 18:24

source share

If speed is a matter of serious concern, you can, depending on what the 3 lines are, really bizarrely creating a tree (for example, the Aho-Corasick algorithm or similar).

A map is possible for each possible state, for example. state [0] ['a'] = 0 if the lines do not start with 'a'.

0

Brian Nov 07 '08 at 21:16

source share

brian d foy · Accepted Answer · 2008-09-23T10:19:50+0000

Moritz says it can be faster than a regular expression. Even if it is a little slower, it is easier to understand at 5 in the morning. :)

  # 0123456789.123456789.123456789.  
 my $ string = "alsdhfaaasccclaaaagalkfgblkgbklfs";  
 my $ pos = 9;  
 my $ length = 3;  
 my $ regex = qr / ^ (aaa | bbb | ccc) /;

 while ($ pos <length $ string)    
     {  
     print "Checking $ pos \ n";  

     if (substr ($ string, $ pos, $ length) = ~ / $ regex /)
         {
         print "Found $ 1 at $ pos \ n";
         last;
         }

     $ pos + = $ length;
     }

How can I find the first occurrence of a pattern in a string from some starting position?

More articles: