Getting the n smallest numbers in a sequence

What is the most efficient way to make the n smallest numbers from a sequence,

[ [1 2 3] [9 2 1] [2 3 4] [5 6 7] ] 

I would like to take the 2 smallest of the sequence based on the first element,

 [1 2 3] [2 3 4] 

I am currently sorting the entire list and then taking the first n elements, but this is probably not the most efficient way, it is a large list, and I need to do this often.

+8
algorithm clojure
source share
3 answers

Joy Clojure , chapter 6.4 describes a lazy sorting algorithm. The beauty of lazy sorting is that it will do as much work as needed to find the first x values. Therefore, if x <n, this algorithm is O (n). Here is a modified version of this algorithm.

 (defn sort-parts [work f] (lazy-seq (loop [[part & parts] work] (if-let [[pivot & xs] (seq part)] (let [psmaller? (partial f pivot)] (recur (list* (filter psmaller? xs) pivot (remove psmaller? xs) parts))) (when-let [[x & parts] parts] (cons x (sort-parts parts f))))))) (defn qsort [xs f] (sort-parts (list xs) f)) (defn cmp [[a _ _] [b _ _]] (> ab)) (def a [[1 2 3] [9 2 1] [2 3 4] [5 6 7]]) (take 2 (qsort a cmp)) 
+3
source share

As a reference, you can use the median median algorithm to select the k-th smallest element in linear time, and then divide by linear time. This will give you the k smallest elements in O (n). Elements, however, will not be sorted, so if you want the k smallest sorted elements, this will cost you another O (klogk).

A few important notes:

  • Firstly, although the complexity of O (n) small constants are not guaranteed, and you can find minimal improvement, especially if your n is small enough. There are random linear selection algorithms that work in better actual times (usually the expected runtime is O (n) with the worst worst cases, but they have smaller constants than deterministic ones).

  • Why can't you support an array in a sorted way? This is likely to be much more productive. You just need to insert each element in the right place, which costs O (logn), but then find the smallest k then O (1) (or O (k) if you need to rebuild the array).

  • If you decide against the above note, then the alternative is to save the array sorted after each such procedure, provide an insert in O (1) to the end of the array, and then perform a β€œmerge sort” each you need to find k smallest elements. That is, you only sort the new ones, and then combine them into linear time. So it will cost O (mlogm + n), where m is the number of elements added after the last sort.

+3
source share

If n is small, you can create a second list of size n that you want to sort, so you always have quick access to the largest in this list; iterating through checking a large list if each of them is smaller than the largest in a small list; if so, insert it into the small list ... the small list is full, pop out the previous oldest.

If n is less than 3 or 4, you can just drag it. If n can be larger, you need to do a binary search to find the insertion point for each. If n can be very large, then there can be another mechanism.

0
source share

All Articles