How to use itertools.groupby when key value is in iterable elements?

To illustrate, I start with a list of 2 tuples:

import itertools import operator raw = [(1, "one"), (2, "two"), (1, "one"), (3, "three"), (2, "two")] for key, grp in itertools.groupby(raw, key=lambda item: item[0]): print key, list(grp).pop()[1] 

gives:

 1 one 2 two 1 one 3 three 2 two 

In an attempt to find out why:

 for key, grp in itertools.groupby(raw, key=lambda item: item[0]): print key, list(grp) # ---- OUTPUT ---- 1 [(1, 'one')] 2 [(2, 'two')] 1 [(1, 'one')] 3 [(3, 'three')] 2 [(2, 'two')] 

Even this will give me the same result:

 for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)): print key, list(grp) 

I want to get something like:

 1 one, one 2 two, two 3 three 

I think this is because the key is inside the tuple inside the list, when in fact the tuple moves as one. Is there a way to achieve the desired result? Maybe groupby() not suitable for this task?

+7
python group-by itertools
source share
3 answers

groupby clusters of consecutive iteration elements that have the same key. To get the desired result, you must first sort the raw .

 for key, grp in itertools.groupby(sorted(raw), key=operator.itemgetter(0)): print key, map(operator.itemgetter(1), grp) # 1 ['one', 'one'] # 2 ['two', 'two'] # 3 ['three'] 
+9
source share

I think a cleaner way to get the desired result is this.

 >>> from collections import defaultdict >>> d=defaultdict(list) >>> for k,v in raw: ... d[k].append(v) ... >>> for k,v in sorted(d.items()): ... print k, v ... 1 ['one', 'one'] 2 ['two', 'two'] 3 ['three'] 

building d is O (n), and now sorted() is located just above the unique keys instead of the entire data set

+6
source share

From docs :

The groupby () operation is similar to the uniq filter on Unix. This generates a break or a new group each time the value of the key change function (which is why it is usually necessary to sort data using the same key function). That behavior is different from SQLs GROUP BY which combines common elements regardless of their input order.

Since you are sorting the tuples lexicographically anyway, you can simply call sorted :

 for key, grp in itertools.groupby( sorted( raw ), key = operator.itemgetter( 0 ) ): print( key, list( map( operator.itemgetter( 1 ), list( grp ) ) ) ) 
+2
source share

All Articles