Sorting a list, knowing the results of how some items are compared?

The goal is to sort the list X of n unknown variables {x0, x1, x2, ... x (n-1)} using list C of the comparison results m (booleans). Each comparison is performed between two of n variables, for example. x2 <x5, and a pair of indices for each of the comparisons are fixed and predefined. Also given: all pairs in C are unique (even when upside down, for example, the pair x0, x1 means that there is no pair x1, x0) and never compare the variable with itself. This means that C has at most n * (n-1) / 2 entries.

So the question is, can I prove that my list C of m comparisons is sufficient to sort the list X? Obviously, this would be if C were the longest possible length (had all possible comparisons). But what about shorter lists?

Then, if it has been proven that C contains enough information to sort, then I actually do the sorting.

+4
source share
2 answers

Imagine that you have a collection of objects that you need to sort, and create a graph of them with one node per object. Then you are presented with a list of pairs indicating how the comparisons go. You can think of them as edges in a graph: if you know that object x compares less than object y, you can draw an edge from x to y.

Assuming the comparison results are consistent, i.e. you do not have cycles; you must have a directed acyclic graph.

Think about what happens if you topologically sort this DAG. What you get is one possible order of values ​​that meets all the restrictions. The reason for this is that, in a topological order, you do not put the element x in front of the element y if there is any transitive series of edges leading from y to x, and there is a transitive series of edges leading from y to x if there is a chain of comparisons, which transitively indicates that y precedes x.

In fact, you can make a stronger requirement: the set of all topological orders of the DAG is exactly the set of all possible orders that satisfy all the restrictions. We have already argued that every topological ordering satisfies all the constraints, so all we need to do now is to assert that every sequence that satisfies all the constraints is an admissible topological ordering. The argument here basically is that if you obey all the restrictions, you never put any element in the sequence before it transitively compares less, so you never put any element in the sequence before it has way to it.

This gives us a good way to solve the problem: take a graph formed in this way and see if it has exactly one topological order. If so, then this ordering is a unique sorted order. If not, then there are two or more orderings.

So what is the best way to do this? Well, one of the standard algorithms for topological sorting is to annotate each node with its degree of uncertainty, then repeatedly stretch the node to zero zero and adjust the indices of its successors. A DAG has exactly one topological ordering, if during the execution of this algorithm at each stage there is exactly one node from zero, since in this case topological ordering is forced.

With proper configuration and data structures, you can implement this to execute in time O (n + m), where n is the number of nodes and m is the number of restrictions. I will leave these details as the notorious exercise for the reader. :-)

+4
source

Your problem can be reduced to known topological sorting .

To prove that “C contains enough information for sorting” is to prove the uniqueness of topological sorting:

If topological sorting has the property that all pairs of consecutive vertices in sorted order are connected by edges, then these edges form a directed Hamiltonian path in the DAG. If there is a Hamiltonian path, then the topological sort order is unique; no other order respects the edge of the path. Conversely, if the topological sorting does not form a Hamiltonian trajectory, the DAG will have two or more admissible topological orders, since in this case it is always possible to form a second admissible ordering, replacing two consecutive vertices that are not connected by an edge to each other. Therefore, in linear time it is possible to check whether a unique ordering exists and whether the Hamiltonian path exists, despite the NP-hardness of the Hamiltonian path problem for more general directed graphs (Vernet & Markenzon, 1997).
+2
source

All Articles