An effective method of replacing an appearance in a sequence with a variable or unchanging state

I am looking for an effective technique for finding the sequence of Op occurrences in Seq[Op] . As soon as the discovery occurs, I want to replace the occurrence with a specific replacement and repeat the same search until the list stops changing.

Scenario:

I have three types of Op class classes. Pop() extends Op , Push() extends Op and Nop() extends Op . I want to replace the appearance of Push(), Pop() with Nop() . In principle, the code may look like seq.replace(Push() ~ Pop() ~> Nop()) .

Problem:

Now, when I call seq.replace(...) , I have to search in the sequence for Push(), Pop() to appear. So far, so good. I find this event. But now I have to splicing the appearance of the list and insert the replacement.

Now there are two options. My list may be mutable or immutable. If I use an immutable list, I'm afraid of performance because these sequences usually have more than 500 elements. If I replace many occurrences, such as A ~ B ~ C ~> D ~ E , I will create many new objects. If I'm not mistaken. However, I could also use a mutable sequence such as ListBuffer[Op] .

Basically, from the background with the linked list, I would just do some bending of the pointer, and after only four operations, I finished the replacement without creating new objects. That's why performance bothers me right now. Moreover, for me it is an important operation.

Question:

How would you implement the replace() method in the Scala module and what data structure would you use, bearing in mind that this is a critical operation?

I am pleased with the answers that point me in the right direction or pseudo-code. There is no need to write a complete replacement method.

Thanks.

+4
source share
1 answer

Well, some considerations need to be made. First, recall that tail does not create objects in lists, and prepending ( :: :) creates only one object for each element added. All in all, as good as you can get, generally speaking.

One way to do this:

 def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = { // This function should be part of an KMP algorithm instead, for performance def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match { case (x :: xs, y :: ys) if x == y => compare(xs, ys) case (Nil, Nil) => true case _ => false } var processed: List[Op] = Nil var unprocessed: List[Op] = input val patternLength = pattern.length val reversedReplacement = replacement.reverse // Do this until we finish processing the whole sequence while (unprocessed.nonEmpty) { // This inside algorithm would be better if replaced by KMP // Quickly process non-matching sequences while (unprocessed.nonEmpty && unprocessed.head != pattern.head) { processed ::= unprocessed.head unprocessed = unprocessed.tail } if (unprocessed.nonEmpty) { if (compare(pattern, unprocessed)) { processed :::= reversedReplacement unprocessed = unprocessed drop patternLength } else { processed ::= unprocessed.head unprocessed = unprocessed.tail } } } processed.reverse } 

You can get speed with KMP, especially if the pattern search is long.

Now, what is the problem with this algorithm? The problem is that it will not check if the replaced pattern will cause a match before this position. For example, if I replace ACB with C, and I have AACBB input, then the result of this algorithm will be ACB instead of C.

To avoid this problem, you must create a return path. First, you check at what position in your template a replacement may occur:

 val positionOfReplacement = pattern.indexOfSlice(replacement) 

Then you modify the replacement part of the algorithm:

  if (compare(pattern, unprocessed)) { if (positionOfReplacement > 0) { unprocessed :::= replacement unprocessed :::= processed take positionOfReplacement processed = processed drop positionOfReplacement } else { processed :::= reversedReplacement unprocessed = unprocessed drop patternLength } } else { 

This will return to solving the problem.

This algorithm will not work efficiently, however, while multiplying patterns at the same time, and I think this is where you are heading. To do this, you probably need to adapt KMP to do this efficiently, or, otherwise, use DFA to control possible matches. This gets even worse if you want to match AB and ABC.

In practice, the full-hit problem is equivalent to matching regular expression and replacement, where replacement is a matching function. This means that, of course, you can start looking for regular expression algorithms.

EDIT

I forgot to finish my reasoning. If for some reason this technique does not work, then my advice comes with a constant tree vector. Tree vectors allow you to replace partial sequences with a small number of copies.

And if this does not happen, then the solution is a doubly linked list. And choose one of the libraries with a slice replacement - otherwise you may spend too much time debugging a well-known but complicated algorithm.

+4
source

All Articles