Finding the minimum distance between unsorted and sorted lists

Question

Finding the minimum distance between unsorted and sorted lists

Let A be a list and S be a sorted list of the same elements. Suppose all elements are different. How to find the minimum set of moves ( move X before Y (or end) ) that turns A into S?

Examples:

 A = [8,1,2,3] S = [1,2,3,8] A => S requires one move: move 8 before end A = [9,1,2,3,0] S = [0,1,2,3,9] A => S requires two moves: move 9 before 0 move 0 before 1

I prefer javascript or python, but any language will do.

+8

javascript python language-agnostic algorithm

georg Jan 30 '14 at 9:38

source share

4 answers

If you think your two lists are two lines — for example, numbers are ASCII encoded values — then the problem is equivalent to the task of finding operations that allow you to convert the first line to the second. The number of operations, in turn, is Levenshtein or the editing distance between lines.

Levenshtein’s distance can be found using dynamic programming , storing in the matrix the distances between all the prefixes of both lines and then tracking your steps to find the optimal operation in each row of the matrix (the one that required the least operations to obtain it).

The longest subsequence algorithm proposed by @IvayloStrandjev is related to the longest common subsequence problem, which in turn is related to edit distance as an alternative metric that only allows insertion and replacement. It is probably more perfect in space because it exploits the fact that one of the sequences must be sorted; I just wanted to provide an alternative answer that is easier for me to understand.

Here is the Python implementation of the full Levenshtein algorithm, as described on the Wikipedia page linked above (originally found in Wagner and Fisher in 1974 ), which also provides proof of correctness . Here we also save the names of the operations in a matrix of the same size as the operations , estimates , and we print the optimal operation after the end of the line.

 import argparse import numpy as np class Levenshtein(object): def __init__(self, string1, string2): self.string1 = string1 self.string2 = string2 self.scores_matrix = np.zeros( (len(self.string1) + 1, len(self.string2) + 1), dtype=np.int16) self.operations_matrix = np.empty_like( self.scores_matrix, dtype=(np.str_, 16)) self.total_steps = 0 def distance(self): m = len(self.string1) + 1 n = len(self.string2) + 1 for i in range(m): self.scores_matrix[i, 0] = i for j in range(n): self.scores_matrix[0, j] = j for j in range(1, n): for i in range(1, m): if self.string1[i - 1] == self.string2[j - 1]: self.scores_matrix[i, j] = self.scores_matrix[i - 1, j - 1] self.operations_matrix[i, j] = 'match' else: self.scores_matrix[i, j] = self.select_operation(i, j) if j == n - 1: # a row is complete self.determine_best_op_and_print(i) return self.scores_matrix[m - 1, n - 1] def select_operation(self, i, j): possible_ops = ['delete', 'insert', 'substitute'] ops_scores = [ self.scores_matrix[i - 1, j] + 1, # deletion self.scores_matrix[i, j - 1] + 1, # insertion self.scores_matrix[i - 1, j - 1] + 1] # substitution chosen_op = min(ops_scores) chosen_op_name = possible_ops[ops_scores.index(chosen_op)] self.operations_matrix[i, j] = chosen_op_name return chosen_op def determine_best_op_and_print(self, i): reversed_row = self.scores_matrix[i][::-1] reversed_pos_min = np.argmin(reversed_row) pos_min = len(self.scores_matrix[i]) - (reversed_pos_min + 1) best_op_name = self.operations_matrix[i, pos_min] if best_op_name != 'match': self.total_steps += 1 print best_op_name, self.string1[i - 1], self.string2[pos_min - 1] def parse_cli(): parser = argparse.ArgumentParser() parser.add_argument('--list', nargs='*', required=True) return parser.parse_args() if __name__ == '__main__': args = parse_cli() A = args.list S = sorted(A) lev = Levenshtein(A, S) dist = lev.distance() print "{} total steps were needed; edit distance is {}".format( lev.total_steps, dist)

Here's how to run the code with the examples you provide and the expected result:

 $ python levenshtein.py --list 8 1 2 3 substitute 8 1 1 total steps were needed; edit distance is 2 $ python levenshtein.py --list 9 1 2 3 0 substitute 9 0 substitute 0 9 2 total steps were needed; edit distance is 2

+4

logc Feb 16 '14 at 14:07

source share

This greatly depends on several parameters of the problem that are not indicated. First, what steps are legal? Are neighboring elements changing only? Any arbitrary deletions and inserts? Secondly, do you just need the number of moves or do you need a list of specific moves to complete? This leads to various algorithms for this:

Only neighboring swaps - this is called the inverse count, if you only care about the minimum quantity.
Deletions, non-contiguous swaps, etc. - Levenshtein distance mentioned earlier is a more general editing distance. One trick about this is how you define your set of movements. Does the element move 3 places in one move or is it two moves (delete and insert)?

Inversion rates are fairly simple and can be performed using some basic recursive algorithms. You can use merge sort to find the inverse counter between two lists, using one list to make the converted version of the other, where the new elements are indexes. Therefore, if you have two sequences, you can do:

 sequence = [seq2.index(element) for element in seq]

A simple implementation of merging with straight direct paths for counting inversions:

 if len(sequence) <= 1: return 0, sequence else: firstHalf = sequence[:int(len(sequence)/2)] secondHalf = sequence[int(len(sequence)/2):] count1, firstHalf = mergeSortInversionCount(firstHalf) count2, secondHalf = mergeSortInversionCount(secondHalf) firstN = len(firstHalf) secondN = len(secondHalf) secondHalfEnd = secondN count3 = count1 + count2 # Count the inversions in the merge # Uses a countdown through each sublist for i in xrange(firstN-1, -1, -1): x = firstHalf[i] inversionFound = False for j in xrange(secondHalfEnd-1,-1,-1): if x > secondHalf[j]: inversionFound = True break if inversionFound: secondHalfEnd = j+1 count3 += j+1 mergeList = firstHalf + secondHalf mergeList.sort() return count3, mergeList

This simply divides the list in half and counts the inversions, sorting the list as it appears. Merge sorting is pretty efficient, algorithmically (NlogN, albeit practically speaking, you could calculate it faster using some numpy matrices or by developing a little adaptation to the C code for the basic Python sorting algorithm. Technically, given that this approach transforms any type of variable into numbers, it basically boils down to simply sorting the list, so you can use other sorting of the list of items to do the same while you track the score.

Using any of these methods (counting inversions, Levenstein, etc.) you can register movements, obviously. An inversion account records swaps, logc noted a reasonable approach for recording some more general moves for Levenstein. Personally, I tend to use inverse counting for this because they are pretty simple. But it depends a lot on what you want. If you need more operations than two-element master swaps, Levenstein is the right choice.

+1

Namey Feb 17 '14 at 21:57

source share

Perform a Cycle Sort and count the number of moves. This guaranteed a minimum quantity.

0

Ashelly Feb 17 '14 at 20:16

source share

Ivaylo strandjev · Accepted Answer · 2014-01-30T11:21:26+0000

This problem is equivalent to the problem of the longest growing subsequence .

You will need to define a comparison operator less . less(a, b) will return true if and only if a is before b in the target sequence. Now, using this comparison operator, calculate the maximum increasing subsequence of the original sequence. You will need to transfer every element that is not part of this subsequence (otherwise the subsequence will not be maximum), and you can move it exactly once (moving it to the target position).

EDIT: As requested by Amit, this is my proof of the above statement: Denote the target sequence b and denote the original sequence a . Let n = |A| and k is the length of the longest ascending sequence as described above.

Suppose that with a , b can be reached with less progress than n - k . This means that at least n - k + 1 elements from a will not be moved. Let s ₁ , s ₂ , ... s _m be a set of elements that do not move. From the assumption, we know that m > k . Now that these elements did not move, their relative position relative to each other could not change. Thus, the relative positions of all these elements in the target sequence b are the same as in a . Therefore, the operator less (s _i , s _j ), as defined above, must be true for any i , j . But if this is so, then s ₁ , s ₂ , ... s _m is an increasing sequence, and as m > k this leads to a contradiction with the assumption that k is the length of the longest increasing sequence.
Now let's show the algorithm for reaching b from a by moving all the elements except those that are part of the longest ascending sequence. We will move the elements in the order in which they appear in B. We will not move the elements that are part of the longest ascending sequence. If the current item is the first in B, we simply move it to the beginning of the sequence. Otherwise, we move the current element to the right after the position of the previous element in B. Note that this element can be either the previous element that we moved or the element from the longest increasing sequence. Please note that at each step, when we are going to move the element with index i , all elements with index 1, 2, ...i-1 will already have the correct relative positions relative to each other.

EDIT: adding some code to make the answer clearer. I do not feel like an expert in javascript, so feel free to correct or criticize my solution.

Define a transform(a, s) function that takes two parameters - lists a and b, as described in the instruction. First I will create a positions map that maps each element in a to its position in s:

 var positions = {}; for (var i = 0; i < a.length; ++i) { positions[a[i]] = i; }

Now that I have this array, I can define a helper function less, as described in my answer above. It will take two values a and b less (and the just created helper map) and return true if and only if a is before b in s (target list):

 function less(a, b, positions) { return positions[a] < positions[b]; }

Now I will not describe how to find the maximum increasing subsequence in a with respect to this comparison operator. You can look at this question for a detailed explanation of how to do this. I just assume that I have a function defined:

 function max_increasing_subsequence(a, positions)

This returns the maximum incremental subsequence in a relative to the comparison operator less , as defined above (using positions ) as a list. I will use your second example to illustrate what we still have:

 A = [9,1,2,3,0] S = [0,1,2,3,9]

The values in the positions will be as follows:

 positions = { 0 : 0, 1 : 1, 2 : 2, 3 : 3, 9 : 4}

And the result of max_increasing_subsequence(a, positions) will be [1, 2, 3] . By the way, if elements can be repeated in a , it may be better to return indexes instead of elements from max_increasing_subsequence (in this specific example, the difference will not be visible).

Now I will create another auxiliary map to indicate which elements are included in the maximum ascending subsequence:

 var included = {}; l = max_increasing_subsequence(a, positions); for (var i = 0; i < l.length; ++i) { included[l[i]] = true; }

Now you can finish the solution with a single iteration over s . I will add a special case for the last element to make the code more understandable:

 if (!(s[s.length - 1] in included)) { console.log("Move" + s[s.length - 1] + " at the end"); } for (var i = s.length - 2; i >= 0; --i) { if (!(s[i] in included)) { console.log("Move" + s[i] + " before " + s[i + 1]); } }

Note that in the solution above, I assume that every time you register a new command, you register it with respect to ordering array a immediately after all previous commands have been executed.

So, in general, I think the conversion should look something like this:

 function transform(a, s) { var positions = {}; for (var i = 0; i < a.length; ++i) { positions[a[i]] = i; } var included = {}; l = max_increasing_subsequence(a, positions); var included = {}; for (var i = 0; i < l.length; ++i) { included[l[i]] = true; } if (!(s[s.length - 1] in included)) { console.log("Move" + s[s.length - 1] + " at the end"); } for (var i = s.length - 2; i >= 0; --i) { // note s.length - 2 - don't process last element if (!(s[i] in included)) { console.log("Move" + s[i] + " before " + s[i + 1]); } } }

Hope this code makes my answer clearer.

Finding the minimum distance between unsorted and sorted lists

More articles: