Search algorithm to find k smallest values ​​in a list

I have a list containing n double values, and I need to find k lower double values ​​in this list

  • k is much less than n
  • the initial list with n double values ​​is randomly ordered.
  • found k lower doubles cannot be sorted

Which algorithm would you recommend?

I am currently using Quicksort to sort the entire list, and then I take the first k items from the sorted list. I expect there should be a much faster algorithm.

Thank you for your help!

+4
source share
5 answers

You can use the selection algorithm to find the k-th lowest element, and then repeat and return it and all the elements below. More work needs to be done if the list may contain duplicates (make sure that you do not get more items that you need).
This solution is O(n) . The selection algorithm is implemented in C ++ as nth_element()

Another option is to use max heap of size k , and also repeat these elements, while maintaining a bunch of all k smallest elements.

 for each element x: if (heap.size() < k): heap.add(x) else if x < heap.max(): heap.pop() heap.add(x) 

When you are done, the heap contains k smallest elements.
This is O(nlogk)

+8
source

You can model your solution according to nlargest () in the Python standard library .

  • Set the first k values ​​to maxheap.
  • Iterate over the remaining n - k values.
  • Compare each item to the top of the heap.
  • If the new value is lower, perform the heapreplace operation (which replaces the top element of the heap with the new value and then sifts it down).

The algorithm can be unexpectedly effective. For example, when n = 100,000 and k = 100, the number of comparisons is usually around 106,000 for randomly located inputs. These are just a little over 100,000 comparisons to find one minimum value. And that's about twenty times fewer comparisons than a complete quick sort over the entire data set.

The relative strength of various algorithms is studied and summarized at: http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest

+10
source

Take a look at the partial_sort algorithm from the C ++ standard library.

+2
source

You can use std :: nth_element . This is O (N) complexity because it does not sort the elements, it just arranges them so that each element under some N is less than N.

+2
source

you can use selection sorting, O (n) is required to select the first smallest value. Once we have set this lowest value in position 1, we can re-set the data set to find out the second lowest value. and can do this until we have the kth smallest value. thus, if k is sufficiently less than n, then we will have complexity kn, which is equivalent to O (n) ...

0
source

All Articles