Python 2.6 - removing and counting duplicates in a dictionary list efficiently

I am trying to effectively change:

[{'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 2}, {'text': 'hallo world', 'num': 1}, {'text': 'haltlo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}] 

to the list of dictionaries without duplicates and the number of duplicates:

 [{'text': 'hallo world', 'num': 2, 'count':1}, {'text': 'hallo world', 'num': 1, 'count':5}, {'text': 'haltlo world', 'num': 1, 'count':1}] 

So far, I have the following to find duplicates:

 result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in li)] 

and it returns:

 [{'text': 'hallo world', 'num': 2}, {'text': 'hallo world', 'num': 1}, {'text': 'haltlo world', 'num': 1}] 

THANKS!

+4
source share
2 answers

I use one of my favorites from itertools :

 from itertools import groupby def canonicalize_dict(x): "Return a (key, value) list sorted by the hash of the key" return sorted(x.items(), key=lambda x: hash(x[0])) def unique_and_count(lst): "Return a list of unique dicts with a 'count' key added" grouper = groupby(sorted(map(canonicalize_dict, lst))) return [dict(k + [("count", len(list(g)))]) for k, g in grouper] a = [{'text': 'hallo world', 'num': 1}, #.... {'text': 'hallo world', 'num': 1}] print unique_and_count(a) 

Output

 [{'count': 5, 'text': 'hallo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}, {'count': 1, 'text': 'haltlo world', 'num': 1}] 

As gnibbler points out, d1.items() and d2.items() can have different key order even if the keys are identical , so I entered canonical_dict to solve this problem.

+4
source

Note: frozenset is now used, which means that items in the dictionary must be hashed.

 >>> from collections import defaultdict >>> from itertools import chain >>> data = [{'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 2}, {'text': 'hallo world', 'num': 1}, {'text': 'haltlo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}] >>> c = defaultdict(int) >>> for d in data: c[frozenset(d.iteritems())] += 1 >>> [dict(chain(k, (('count', count),))) for k, count in c.iteritems()] [{'count': 1, 'text': 'haltlo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}, {'count': 5, 'text': 'hallo world', 'num': 1}] 
+5
source

All Articles