Sort list by index order

Suppose I have the following two sequences:

val index = Seq(2,5,1,4,7,6,3) val unsorted = Seq(7,6,5,4,3,2,1) 

The first is the index by which you want to sort the second. My current solution is to intersect the index and build a new sequence with found elements from an unsorted sequence.

 val sorted = index.foldLeft(Seq[Int]()) { (s, num) => s ++ Seq(unsorted.find(_ == num).get) } 

But this solution seems very inefficient and error prone. At each iteration, he searches for a complete unsorted sequence. And if the index and the unsorted list are not synchronized, then either the error will be reset, or the element will be omitted. In both cases, elements that are not part of the synchronization must be added to the ordered sequence.

Is there a more effective and reliable solution to this problem? Or is there a sorting algorithm that fits into this paradigm?


Note This is a built example. In fact, I would like to sort the list of mongodb documents by an ordered list of document identifiers.


Update 1

I chose the answer from Marius Danila because it seems to be a faster and scala -ish solution for my problem. It does not come with a solution other than as a synchronization element, but it can be easily implemented.

So here is the updated solution:

 def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = { val positionMapping = HashMap(index.zipWithIndex: _*) val inSync = new Array[T](unsorted.size) val notInSync = new ArrayBuffer[T]() for (item <- unsorted) { if (positionMapping.contains(key(item))) { inSync(positionMapping(key(item))) = item } else { notInSync.append(item) } } inSync.filterNot(_ == null) ++ notInSync } 

Update 2

The approach suggested by Bask.cc seems to be the right answer. It also does not take into account the problem of non-synchronization, but it can also be easily implemented.

 val index: Seq[String] val entities: Seq[Foo] val idToEntityMap = entities.map(e => e.id -> e).toMap val sorted = index.map(idToEntityMap) val result = sorted ++ entities.filterNot(sorted.toSet) 
+8
sorting scala
source share
7 answers

Why do you want to sort the collection when you have already sorted the index collection? You can just use the card

Regarding> Actually, I would like to sort the list of mongodb documents by an ordered list of document identifiers.

 val ids: Seq[String] val entities: Seq[Foo] val idToEntityMap = entities.map(e => e.id -> e).toMap ids.map(idToEntityMap _) 
+4
source share

This may not match your use case, but Googlers may find this useful:

 scala> val ids = List(3, 1, 0, 2) ids: List[Int] = List(3, 1, 0, 2) scala> val unsorted = List("third", "second", "fourth", "first") unsorted: List[String] = List(third, second, fourth, first) scala> val sorted = ids map unsorted sorted: List[String] = List(first, second, third, fourth) 
+2
source share

I do not know the language you use. But, regardless of the language, I would solve this problem.

From the first list (here "index") create a hash table in which the key and value will be indicated as the document identifier as the document position in sorted order.

Now, when you look at the list of documents, I look at the hash table using the document ID, and then I get the position, which should be in sorted order. Then I would use this resulting order to sort in the previously allocated memory.

Note. If the number of documents is small, then instead of using a hash table u, you can use a pre-distributed table and index it directly using the document identifier.

+1
source share

Plane Displaying an index on an unsorted list seems like a safer version (if the index is not found, it just crashes because find returns None ):

 index.flatMap(i => unsorted.find(_ == i)) 

He still has to move the unsorted list every time (in the worst case, this is O (n ^ 2)). With your example, I'm not sure if there is a more efficient solution.

+1
source share

The best I can do is create a Map from unsorted data and use a map search (basically a hash table suggested by the previous poster). The code looks like this:

 val unsortedAsMap = unsorted.map(x => x -> x).toMap index.map(unsortedAsMap) 

Or, if you can skip the hash:

 val unsortedAsMap = unsorted.map(x => x -> x).toMap index.flatMap(unsortedAsMap.get) 

This is O(n) in time *, but you change the time to space, because it uses the O(n) space.

For a slightly more complex version that handles missing values, try:

 import scala.collection.JavaConversions._ import scala.collection.mutable.ListBuffer val unsortedAsMap = new java.util.LinkedHashMap[Int, Int] for (i <- unsorted) unsortedAsMap.add(i, i) val newBuffer = ListBuffer.empty[Int] for (i <- index) { val r = unsortedAsMap.remove(i) if (r != null) newBuffer += i // Not sure what to do for "else" } for ((k, v) <- unsortedAsMap) newBuffer += v newBuffer.result() 

If this is a MongoDB database, first of all, you can better get documents directly from the database by index, so something like:

 index.map(lookupInDB) 

* technically, this is O(n log n) , since the standard immutable Scala map is O(log n) , but you can always use a mutable map, which is O(1)

+1
source share

In this case, you can use zip-sort-unzip:

(unsorted zip index).sortWith(_._2 < _._2).unzip._1

Btw, if you can, the best solution would be to sort the list on the db side using $ orderBy .

+1
source share

Ok

Let it start from the very beginning. Besides the fact that you look at the unsorted list every time, the Seq object creates a List collection by default. So in foldLeft you add an item at the end of the list every time, and this is O(N^2) operation.

Improvement will be

 val sorted_rev = index.foldLeft(Seq[Int]()) { (s, num) => unsorted.find(_ == num).get +: s } val sorted = sorted_rev.reverse 

But this is still an O(N^2) algorithm. We can do better.

The following sort function should work:

 def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = { val positionMapping = HashMap(index.zipWithIndex: _*) //1 val arr = new Array[T](unsorted.size) //2 for (item <- unsorted) { //3 val position = positionMapping(key(item)) arr(position) = item } arr //6 } 

The function sorts the list of unsorted elements using a sequence of indexes index , where the key function will be used to extract the identifier from the objects you are trying to sort.

Line 1 creates a reverse index - mapping each object identifier to its final position.

Line 2 allocates an array that will contain the sorted sequence. We use an array, because we need constant job performance in an arbitrary position.

The loop that starts on line 3 will traverse the sequence of unsorted elements and put each element at that value using the positionMapping index, the inverse index

Line 6 will return an array converted implicitly to Seq using the WrappedArray wrapper.

Since our reverse index is a constant HashMap , the search should take constant time for regular cases. Building the actual inverse index takes O(N_Index) time, when N_Index is the size of the index sequence. Passing an unsorted sequence takes O(N_Unsorted) time, when N_Unsorted is the size of the unsorted sequence.

So the complexity is O(max(N_Index, N_Unsorted)) , and I think this is the best thing you can do in the circumstances.

In your specific example, you call the function as follows:

 val sorted = sort(index, unsorted, identity[Int]) 

For a real case, this is likely to be as follows:

 val sorted = sort(idList, unsorted, obj => obj.id) 
+1
source share

All Articles