Sort by previous sorts

I am trying to sort the list of identifiers based on a "sorting map", which is an array of tuples (ID1, ID2, timestamp) , which determines which identifiers should be sorted before other identifiers. Here are the rules:

  • ID1 should be sorted to ID2 .
  • A timestamp can be used to break links with newer timestamps that exceed older timestamps. for example, a given sort key (C, A, 1/1/1900), (C, B, 1/1/2000) , then B sorted to A
  • there may be cycles, for example. (A, B, 1/1/1950), (B, C, 1/1/1980), (C, A, 1/1/1900) . The timestamp can be used to interrupt the cycles, while the records with the old timestamps in the cycle are deleted from the sort card until the cycle disappears.
  • If the identifier is not on the sort map, it is sorted after any identifier that is present in the sort map

Example: to display the sort (C, A, 1/1/1900), (C, B, 1/1/2000) and the list (A, B, C, D) for sorting, the sorted result will be (C, B, A, D) .

I am puzzled by turning these rules into an algorithm. Here is what I still have:

  • Get the latest sorting map from the database. I get a maximum of one entry for each unique pair of identifiers.

  • Removing loops from a sorting chart. How? Or is it easier to just ignore loops as part of step 4?

  • Convert a sort map into memory for optimal performance. For example, build a hash table whose key is each unique identifier on the sorting map, so I can quickly find all sorting lines containing a specific identifier.

  • Sort my identifier array using a shared binary sort library using a custom comparison function that accepts any two identifiers ID1 and ID2 . Comparison function:

    but. View all sorting records containing ID1 or ID2 using the hash table from step # 3.

    b. If I already have an entry containing both ID1 and ID2 on the sorting card, stop - we know which one should be the first!

    from. If neither ID1 nor ID2 are found on the sort map, then this is a tie. Return deterministically arbitrary results (e.g. lower id wins).

    e. If one identifier is on the sorting map and the other is not, stop. Found should be sorted first.

    e. If we get here, both identifiers are on the sort map, but there is no direct comparison on the sort map. Now what?

Performance does not cause much concern, since the maximum size of the sorting map is less than 20 thousand lines, and the maximum number of sorted identifiers is less than 30.

Any ideas?

FWIW, we will use the .NET List<T>.Sort(Comparison<T>) to do the sorting in C #, but the underlying algorithm is obviously a language and agnostic platform.


If you're interested, the real need for this algorithm is needed here:

Our company is building mobile applications for delivery drivers who visit about 20 places every day from the territory of 100-150 of all places for which they are responsible. A list of places every day is dynamically assigned based on the inventory of each location. Places that have a low inventory receive delivery of new inventory, and places where there is still enough inventory are not visited.

Drivers can visit places in any order, but they usually follow the same routes every day (for example, they visit places in the southern part of the city when traffic is shining in the morning, and then they visit places in the northern part of the city when traffic is heavier than the South).

We decided not to use third-party routing software, which automatically determines the most efficient route. Instead, we found it better to allow the driver to choose a route because the routing program has a difficult time with restrictions such as “that the docking station is usually only available until 7AM” or “the guy who needs to sign the delivery receipt, Fridays, which have a big impact on supply schedules.

In any case, we would like to use the historical driver selection to sort the route every day in the same order in which the driver visited the same locations for the last time. This will give the driver a beautifully designed route every day that matches his preferences, without having to manually re-schedule, except in unusual cases. This will save the driver a minute or two every day, which will add up over time.

Each historical route is indeed a list like this (ID1, ID2, ID3, ..., IDN, timestamp), but as an alternative to keeping hundreds of past schedules, I thought it would be easier to put each N-machine historical route into pairs of cars . This means that I have to store at most N * N-1 tuples, because new orders always push the old ones out of the sorting map. If this is a poor simplification, let me know .; -)

+7
source share
2 answers

What you are looking for is called Topological Sort . Using this search query, you are likely to find very good resources.

There is one complication in your specific domain: Cycles (because drivers behave inconsistently over time). You are correct in that you need to break the dependency cycles, because otherwise topological sorting will not work.

You also need to break all loops longer than two.

Let us consider your ID-card as a graph: identifiers (places) are nodes. The entries on your map are edges (from ID1 to ID2). An easy way to do this would be to:

 while true allCycles = getListOfAllCycles(); if (allCycles.length == 0) break; breakNode = chooseBreakNode(allCycles); //defined later deleteBreakNodeFrom(allCycles); chooseBreakNode: chose the node which has been driven to the least //node is not important if ambiguous: chose the node in the dependency graph which is present in the highest number of cycles //breaks multiple cycles at once if ambiguous: chose the node which is in the longest cycle if ambiguous: pick an arbitrary node 

I probably did not get chooseBreakNode quite correctly. This is a heuristic that you can customize to your needs.

+3
source

I suggest an alternative approach, but let me know if I don’t understand what business needs.

There is a table of type (DriverId, LocationId, Priority), which saves the relative priority of locations for each driver.

Anytime you need to process a completed route, start at the bottom of the list (last visited location) and run the following algorithm for each location, going up the list:

  • If the priority of the location no longer exceeds the priority of the location below it, then newPriority = priorityBelow + 1. (If there is nothing lower, priorityBelow = 0)

When you finish processing the list, re-normalize the priority points as 1,2,3 ... (making the lowest priority = 1, the second minimum = 2, etc.)

Then, when you need to order a new route, you simply order seats according to their relative priority values ​​for this driver.

Have you considered this approach?


EDIT: Adding example code for the comment below.

given 4 historical routes: ABCD (newest), ACBE, CBDF, CBDFA (oldest), how would I sort the new ABCDEF route?

 static Dictionary<string, int> Priorities = new Dictionary<string, int>(); static void Main(string[] args) { var itineraries = new string[][]{ new string[] { "C", "B", "D", "F", "A" }, new string[] { "C", "B", "D", "F" }, new string[] { "A", "C", "B", "E" }, new string[] { "A", "B", "C", "D" } }; //process past itineraries foreach (var itinerary in itineraries) ProcessItinerary(itinerary); //sort new itinerary string[] newItinerary = { "A", "B", "C", "D", "E", "F" }; string[] sortedItinerary = newItinerary.OrderByDescending( x => Priorities.ContainsKey(x) ? Priorities[x] : 1).ToArray(); Console.WriteLine(String.Concat(sortedItinerary)); Console.ReadKey(); } static void ProcessItinerary(string[] itinerary) { itinerary.Reverse().Aggregate((below, above) => { int priBelow = Priorities.ContainsKey(below) ? Priorities[below] : Priorities[below] = 1; if (!(Priorities.ContainsKey(above) && Priorities[above] > priBelow)) Priorities[above] = priBelow + 1; return above; }); //normalize priorities // (note: running in reverse so that if priorities tie, // the older location has higher priority) int i = Priorities.Count; foreach (var pair in Priorities.OrderByDescending(x => x.Value)) Priorities[pair.Key] = i--; } 

It will print: ABCDFE

0
source

All Articles