How can I get the n largest lists from a list of lists in python

I use heapq to get the smallest items from a list of lists. The program I wrote is below.

import csv import heapq f = open("E:/output.csv","r") read = csv.reader(f) allrows = [row for row in read] for i in xrange(0,2): print allrows[i] allrows.sort(key=lambda x: x[2]) #this is working properly it=heapq.nlargest(20,enumerate(allrows),key=lambda x:x[2]) #error 

I just need the top 20 items. So instead of sorting, I thought about using heap. The error I get is

  Traceback (most recent call last): File "D:\eclipse_progs\DaDv\IMDB\Assignment1.py", line 42, in <module> it=heapq.nlargest(2,enumerate(allrows),key=lambda x:x[2]) File "C:\Python27\lib\heapq.py", line 470, in nlargest result = _nlargest(n, it) File "D:\eclipse_progs\DaDv\IMDB\Assignment1.py", line 42, in <lambda> it=heapq.nlargest(2,enumerate(allrows),key=lambda x:x[2]) IndexError: tuple index out of range 

Can I find out why I get the error and how to solve it. Is there any property of using heapq that I don't see.

+7
python
source share
2 answers

enumerate() returns iterability over 2 tuples. Thus, access to x[2] in the second example will always be outside the valid range (the only valid indices are 0 and 1).

To make the second example equivalent to the first, you should pass allrows directly and not use enumerate() :

 it = heapq.nlargest(20, allrows, key=lambda x:x[2]) 

If you need to keep the original indexes, enumerate() is the way to go. However, you also need an additional level of indirection in the key function:

 it = heapq.nlargest(20, enumerate(allrows), key=lambda x:x[1][2]) ^^^^^^^^^ ^^^ 
+2
source share

Thanks to NPE for reporting the issue. As an alternative answer, you can combine all your lines with itertools.chain() and get element 20 with sorting, which have better performance than heapq :

 from itertools import chain sorted(chain(*allrows))[-20:] 

The nlargest() and nsmallest() functions are most suitable if you are trying to find a relatively small number of items. If you're just trying to find the single smallest or largest element (N=1) , it is faster to use min() and max() . Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice (ie use sorted(items)[:N] or sorted(items)[-N:] ).

+2
source share

All Articles