Efficient perl or python algorithm

The interviewer asked a question in an interview to write a quick and efficient algorithm for the function below,

Problem: Write a function for parsing a given string for the rules below and print the final parsed string as output

write a function that takes a string as input, the length of the string will be between [0..2000000000]

Line

should only be made of the characters 'A', 'B' and 'C' , such as "AAA", "ABCABC", "AAAABBBBABABAAACCCA"

Conversion Rules:


1) 'AB' β†’ 'AA'
2) 'AC' β†’ 'AA'
3) 'AA' β†’ 'A'
4) 'CC' β†’ 'C'
5) 'BC' β†’ 'BB'
6) 'BB' β†’ 'B'


Apply all the above 6 rules randomly on a given line each time and make a finally converted line as an output

For example, input to a function: string "ABCAAB"

ABCAAB β†’ AACAAB [AB = AA]
AACAAB β†’ ACAAB [AA = A]
ACAAB β†’ AAAAB [AC = AA]
AAAAB β†’ AAAB [AA = A]
AAAB β†’ AAB [AA = A]
AAB β†’ AB [AA = A]
AB β†’ AA [AB = AA]
AA β†’ A [AA = A]

End result: 'A'
Because now we can’t apply more rules for the string.

My answer was (pseudo code):

sub myFunc { my $str = @_; flag = 1 while(flag ==1) { if ($str =~ /AB/){ $str =~ s/AB/AA/; next; } elsif($str =~ /AC/){ $str =~ s/AC/AA/; next; } elsif($str =~ /AA/){ $str =~ s/AA/A/; next; } elsif($str =~ /CC/){ $str =~ s/CC/C/; next; } elsif($str =~ /BC/){ $str =~ s/BC/BB/; next; } elsif($str =~ /BB/){ $str =~ s/BB/B/; next; } else { flag = 0 } } //while return $str; } //func 

Can anyone suggest a better algorithm and approach for the above problem?

+7
python algorithm perl
source share
4 answers

Rules 1 through 3 discard any character following A.
Rules 5 and 6 discard any B and C following B.
Rule 4 will discard any C following CC. The substitution order does not matter.

So, after processing, the string will be one of C, CB, CA, CBA, B, BA, A.

One line scan is enough. If you see C, save it and skip the following C; then, if you see B, save it and skip the next B; then if you see that A will save it and stop.

This ABCAAB example immediately yields A.

Decisions with explicit application of rules and several passes are unacceptable, since their behavior can be O(NΒ²) or even O(NΒ³) , and N=2000000000 .

+15
source share

You can repeat the substitution when it matches the conversion rules,

 my %h = ( 'AB' => 'AA', 'AC' => 'AA', 'AA' => 'A', 'CC' => 'C', 'BC' => 'BB', 'BB' => 'B', ); my $s = 'ABCAAB'; 1 while $s =~ s/(AB|AC|AA|CC|BC|BB)/$h{$1}/; # also without /g switch print $s; 

Exit

 A 
+8
source share

here is the python solution:

 In [34]: import ranodm In [35]: rules = {"AB":"AA",'AC':'AA','AA':'A','CC':'C','BC':'BB','BB':'B'} In [36]: keys = rules.keys() In [37]: keys Out[37]: ['AA', 'AC', 'AB', 'BB', 'BC', 'CC'] In [38]: mystr = 'ABCAAB' In [42]: while len(mystr)>=2: r = random.choice(keys) #choose one rule randomly mystr = mystr.replace(r,rules[r]) ....: In [43]: mystr Out[43]: 'A' 
+1
source share

Yves Daoust's answer is right, it makes no sense to imitate it. It seems like a trick and that you must understand this and understand the behavior and implement it directly.

Eaves method, but here is the actual implementation of a similar one:

 def transform(string): a = string.find('A') + 1 b = string.find('B', 0, a or len(string)) + 1 return 'C' * string.startswith('C') + 'B' * (b>0) + 'A' * (a>0) 

I searched for the first β€œA”, then searched for β€œB” to the left of it (or in the entire line if there was no β€œA”). This tells me whether "B" and "A" belong to the exit. For "C", I just need to check the beginning of the line. Although I potentially scan the whole line twice, and not just once, as Yves suggests, using the find function makes it pretty fast, about 100 times faster than just looping through the line β€œmanually” and just looking for β€œA”, (which is only at the end of my test line):

 >>> from timeit import timeit >>> s = 'C' * 100000000 + 'BA' >>> timeit(lambda: transform(s), number=1) 0.10583857561359977 >>> timeit(lambda: next(x for x in s if x == 'A'), number=1) 10.957446603791382 

You can do this with just one scan, using lstrip('C') to find the first character other than 'C' and faster than doing it manually, but it uses extra memory and is still much slower than find

 >>> timeit(lambda: s.lstrip('C'), number=1) 2.411250071361735 

Regular expressions may also do this, but even just scanning my test string as soon as finding β€œA” takes longer than my entire transform :

 >>> timeit(lambda: re.search('A', s), number=1) 0.13403880409939006 
0
source share

All Articles