Effective memory algorithm for the "take n (sort xs)" ("sorted prefix") task

Question

Effective memory algorithm for the "take n (sort xs)" ("sorted prefix") task

I want to take the n largest items from a lazy list.

I heard that mergesort implemented in Data.List.sort is lazy and does not create more elements than necessary. This may be true in terms of comparisons, but of course this is not the case when it comes to memory usage. The following program illustrates the problem:

{-# LANGUAGE ScopedTypeVariables #-} module Main where import qualified Data.Heap as Heap import qualified Data.List as List import System.Random.MWC import qualified Data.Vector.Unboxed as Vec import System.Environment limitSortL n xs = take n (List.sort xs) limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs) main = do st <- create rxs :: [Int] <- Vec.toList `fmap` uniformVector st (10^7) args <- getArgs case args of ["LIST"] -> print (limitSortL 20 rxs) ["HEAP"] -> print (limitSortH 20 rxs) return ()

Runtime:

Data.List:

  ./lazyTest LIST + RTS -s 
 [-9,223,371,438,221,280,004, -9,223,369,283,422,017,686, -9,223,368,296,903,201,811, -9,223,365,203,042,113,783, -9,223,364,809,100,004,863, -9,223,363,058,932,210,878, -9,223,362,160,334,234,021, -9,223,359,019,266,180,408, -9,223,358,851,531,436,915, -9,223,345,045,262,962,114, -9,223,343,191,568,060,219, -9,223,342,956,514,809,662, -9,223,341,125,508,040,302, -9,223,340,661,319,591,967, -9,223,337,771,462,470,186, -9,223,336,010,230,770,808 - 9223331570472117335, -9223329558935830150, -9223329536207787831, -9223328937489459283]
    2,059,921,192 bytes allocated in the heap
    2,248,105,704 bytes copied during GC
      552,350,688 bytes maximum residency (5 sample (s))
        3,390,456 bytes maximum slop
             1168 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0: 3772 collections, 0 parallel, 1.44s, 1.48s elapsed
   Generation 1: 5 collections, 0 parallel, 0.90s, 1.13s elapsed

   INIT time 0.00s (0.00s elapsed)
   MUT time 0.82s (0.84s elapsed)
   GC time 2.34s (2.61s elapsed)
   EXIT time 0.00s (0.00s elapsed)
   Total time 3.16s (3.45s elapsed)

   % GC time 74.1% (75.7% elapsed)

   Alloc rate 2,522,515,156 bytes per MUT second

   Productivity 25.9% of total user, 23.7% of total elapsed

Data.Heap:

  ./lazyTest HEAP + RTS -s 
 [-9,223,371,438,221,280,004, -9,223,369,283,422,017,686, -9,223,368,296,903,201,811, -9,223,365,203,042,113,783, -9,223,364,809,100,004,863, -9,223,363,058,932,210,878, -9,223,362,160,334,234,021, -9,223,359,019,266,180,408, -9,223,358,851,531,436,915, -9,223,345,045,262,962,114, -9,223,343,191,568,060,219, -9,223,342,956,514,809,662, -9,223,341,125,508,040,302, -9,223,340,661,319,591,967, -9,223,337,771,462,470,186, -9,223,336,010,230,770,808 - 9223331570472117335, -9223329558935830150, -9223329536207787831, -9223328937489459283]
  177,559,536,928 bytes allocated in the heap
      237,093,320 bytes copied during GC
       80,031,376 bytes maximum residency (2 sample (s))
          745,368 bytes maximum slop
               78 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0: 338539 collections, 0 parallel, 1.24s, 1.31s elapsed
   Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed

   INIT time 0.00s (0.00s elapsed)
   MUT time 35.24s (35.46s elapsed)
   GC time 1.24s (1.31s elapsed)
   EXIT time 0.00s (0.00s elapsed)
   Total time 36.48s (36.77s elapsed)

   % GC time 3.4% (3.6% elapsed)

   Alloc rate 5,038,907,812 bytes per MUT second

   Productivity 96.6% of total user, 95.8% of total elapsed

It is clear that limitSortL is much faster, but it is also very hungry. In large lists, he was gaining RAM size.

Is there a faster algorithm to solve this problem that is not memory hungry?

Edit : Clarification: I am using Data.Heap from the heap package, I have not tried the heap package.

+7

memory-management sorting haskell lazy-evaluation

Tener May 10 '11 at 12:16

source share

6 answers

There are many selection algorithms specifically designed for this. The partition-based algorithm is "classic", but, like Quicksort, it is not suitable for Haskell lists. Wikipedia is not very much related to functional programming, although I suspect that the described “tournament choice” is the same or not very different from your current merge solution.

If you are concerned about memory consumption, you can use the priority queue - it uses O (K) memory and O (N * logK) time:

 queue := first k elements for each element in the rest: add the element to the queue remove the largest element from the queue convert the queue to a sorted list

+3

hugomg May 10 '11 at 14:30

source share

“Quickly sorted and k-th smallest elements”, always attractive Heinrich Apfelm: http://apfelmus.nfshost.com/articles/quicksearch.html

+2

sclv May 11 '11 at 1:26

source share

Sorry if I can not decrypt

  Vec.toList `fmap` uniformVector st (10^7)

but how long will this list be? Is it clear that no matter how lazy the merge is, it should at least implement the whole list?

Update:

I heard that mergesort implemented in Data.List.sort is lazy and it does not produce more elements than necessary.

This doesn't say anything about consuming mergesorts space before it can start delivering the first elements of the list. In any case, he will have to walk (and thereby implement) the entire list, select it to combine the sublists, etc. Here is the link from http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/merge/mergen.htm

The disadvantage of combining is that it requires additional space Θ (n) for the temporary array b.
There are various possibilities for implementing merging functions. Most effective ones are option b. It only takes twice as much space, it is faster than other options, and it is stable.

+1

Ingo May 10, '11 at 12:35

source share

Perhaps you are mistaken in the problem. This may be the case of too much laziness, and not too little.

Maybe you should try a more strict data structure or a modified array in the ST monad.

For a variable array approach, you can limit the number of movements for each insert to n/2 instead of n-1 by writing an index h that "points to" the head of the queue stored in the array and allows the queue to "wrap" inside the array.

0

Robin green May 10, '11 at 14:54

source share

Memory efficiency rarely has haskell power. However, it is not so difficult to create a sorting algorithm that is more lazy than combining. For example, here is a simple quicksort:

 qsort [] = [] qsort (x:xs) = qcombine (qsort a) b (qsort c) where (a,b,c) = qpart x (x:xs) ([],[],[]) qpart _ [] ac = ac qpart n (x:xs) (a,b,c) | x > n = qpart n xs (a,b,x:c) | x < n = qpart n xs (x:a,b,c) | otherwise = qpart n xs (a,x:b,c) qcombine (a:as) bc = a:qcombine as bc qcombine [] (b:bs) c = b:qcombine [] bs c qcombine [] [] c = c

I used explicit recursion to make it clear what was going on. Each part here is truly lazy, which means qcombine will never call qsort c unless it is needed. This should reduce memory usage if you just want to use the first few elements.

You can create the best sorting algorithm for this particular task, which uses quicksort style elements to get the first n list items in unsorted order. Then just call the highly efficient sorting algorithm on those if you need them.

an example of this approach:

 qselect 0 _ = [] qselect n [] = error ("cant produce " ++ show n ++ " from empty list") qselect n (x:xs) | al > n = qselect na | al + bl > n = a ++ take (al - n) b | otherwise = a ++ b ++ (qselect (n - al - bl) c) where (a,al,b,bl,c,cl) = qpartl x (x:xs) ([],0,[],0,[],0) qpartl _ [] ac = ac qpartl n (x:xs) (a,al,b,bl,c,cl) | x > n = qpartl n xs (a,al,b,bl,x:c,cl+1) | x < n = qpartl n xs (x:a,al+1,b,bl,c,cl+1) | otherwise = qpartl n xs (a,al,x:b,bl+1,c,cl)

Again, this code is not the cleanest, but I want to make it clear what it does.

In the case when you want to accept a very low number, the choice of sorting is optimal. For example, if you need only the highest element in the list, you can iterate over it, providing a large theta of the size of the list.

On the other hand, if you want almost the entire list, but do not care about ordering it, you can re-“delete” the bottommost items in the list.

Both of these approaches and the quick sorting above are O (n ^ 2), but you want a strategy that works in large O (k * n) often to not use a ton of space.

Another option is to use an in-place sorting algorithm to control memory usage. I do not know a single lazy place on the spot, but if they exist, that would be wonderful.

0

Philip jf May 11 '11 at 12:58

source share

Tener · Accepted Answer · 2011-05-10T22:04:14+0000

So, I really managed to solve the problem. The idea is to throw away fantastic data structures and work manually ;-) Basically, we break the input list into pieces, sort them and add the [[Int]] list, choosing the smallest n elements at each step. Part of the trick is merging the battery with the sorted piece in the proper order. We have to use seq or else laziness will bite you, and as a result, it will still take a lot of memory to calculate. In addition, I mix merge with take n to just optimize the situation. That's the whole program, as well as previous attempts:

 {-# LANGUAGE ScopedTypeVariables, PackageImports #-} module Main where import qualified Data.List as List import qualified Data.List.Split as Split import qualified "heaps" Data.Heap as Heap -- qualified import from "heaps" package import System.Random.MWC import qualified Data.Vector.Unboxed as Vec import System.Environment limitSortL n xs = take n (List.sort xs) limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs) takeSortMerge n inp = List.foldl' (\acc lst -> (merge n acc (List.sort lst))) [] (Split.splitEvery n inp) where merge 0 _ _ = [] merge _ [] xs = xs merge _ ys [] = ys merge f (x:xs) (y:ys) | x < y = let tail = merge (f-1) xs (y:ys) in tail `seq` (x:tail) | otherwise = let tail = merge (f-1) (x:xs) ys in tail `seq` (y:tail) main = do st <- create let n1 = 10^7 n2 = 20 rxs :: [Int] <- Vec.toList `fmap` uniformVector st (n1) args <- getArgs case args of ["LIST"] -> print (limitSortL n2 rxs) ["HEAP"] -> print (limitSortH n2 rxs) ["MERGE"] -> print (takeSortMerge n2 rxs) _ -> putStrLn "Nothing..." return ()

Runtime performance, memory consumption, GC time:

  LIST 3.96s 1168 MB 75%
 HEAP 35.29s 78 MB 3.6%
 MERGE 1.00s 78 MB 3.0%
 just rxs 0.21s 78 MB 0.0% - just evaluating the random vector

Effective memory algorithm for the "take n (sort xs)" ("sorted prefix") task

More articles: