Fastest way to get k smallest numbers in an unsorted list of size N in python?

What is the fastest way to get k smallest numbers in an unsorted list of size N using python?
Is it possible to quickly sort a large list of numbers and then get the k smallest numbers, or or get the smallest number k by finding the minimum in the list k times, making sure that u removed the found minimum from the search before the next search?

+2
source share
6 answers

You can use the heap queue; it can give you the K largest or smallest numbers from a list of size N in O (NlogK).

Python heapq module, heapq.nsmallest() :

import heapq

k_smallest = heapq.nsmallest(k, input_list)

K K , N-K, , . log K time, O (NlogK).

:

  • K 1, min(), O (N).
  • K >= N, , O (NlogN) O (NlogK).

introselect, O (n). , , - numpy.partition() function:

import numpy

# assuming you have a python list, you need to convert to a numpy array first
array = numpy.array(input_list)
# partition, slice back to the k smallest elements, convert back to a Python list
k_smallest = numpy.partition(array, k)[:k].tolist()

numpy, N ( K heapq), .

, :

heapq.nsmallest(k, range(len(input_list)), key=input_list.__getitem__)  # O(NlogK)
numpy.argpartition(numpy.array(input_list), k)[:k].tolist()  # O(N)
+7

EDIT: , . , .

O(n * log k), k + 1.

  • k -.
  • heapify.
  • .

Heapify , , , .

+3

O(kn) . kn >= n log n . , , , i (kn) j (n log n). , n k.

: . . .

+2

heapq:

In [109]: L = [random.randint(1,1000) for _ in range(100)]

In [110]: heapq.nsmallest(10, L)
Out[110]: [1, 17, 17, 19, 24, 37, 37, 45, 63, 73]
+2

k- , O (n) , introselect. , NumPy numpy.partition :

partitioned = numpy.partition(l, k)
# The subarray partitioned[:k] now contains the k smallest elements.
+2

nsmallest heapq - , , . , heappush heappop O (log n), k.

import heapq

def getsmallest(arr, k):
    m = [-x for x in l[:k]]
    heapq.heapify(m)
    for num in arr[5:]:
        print num, m
        heapq.heappush(m, max(-num, heapq.heappop(m)))
    return m

if __name__ == '__main__':
    l = [1,2,3,52,2,3,1]
    print getsmallest(l, 5)
0

All Articles