Python - List of Unique Dictionaries

Question

Python - List of Unique Dictionaries

Say I have a list of dictionaries:

[ {'id': 1, 'name': 'john', 'age': 34}, {'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, ]

and I need to get a list of unique dictionaries (duplicate removal):

 [ {'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, ]

Can someone help me with the most efficient way to achieve this in Python?

+116

python dictionary

Limaaf Jun 18 '12 at 23:30

source share

16 answers

The usual way to find only common elements in a set is to use the Python set class. Just add all the elements to the set, then convert the set to list and bam duplicates will disappear.

The problem, of course, is that a set() can only contain hash entries, a dict not hashed.

If I had this problem, my solution would be to convert each dict to a string that represents a dict , then add all the lines to set() and then read the string values as list() and go back to dict .

A good representation of dict in string form is the JSON format. And Python has a built-in module for JSON (called json , of course).

The rest of the problem is that the elements in the dict not ordered, and when Python converts the dict string to a JSON string, you can get two JSON strings that represent equivalent dictionaries but not identical strings. A simple solution is to pass the argument sort_keys=True when you call json.dumps() .

EDIT: this solution assumed that a given dict could have any part. If we can assume that every dict with the same "id" value matches any other dict with the same "id" value, then this will be redundant; The @gnibbler solution will be faster and easier.

EDIT: There is now a comment by Andre Lima, which explicitly states that if the identifier is a duplicate, we can safely assume that the entire dict is a duplicate. So this answer is redundant and I recommend @gnibbler answer.

+63

steveha Jun 18 2018-12-18T00:

source share

You can use the numpy library (works only for Python2.x):

  import numpy as np list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))

For this to work with Python 3.x (and the latest versions of numpy), you need to convert the dicts array to a numpy string array, e.g.

 list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))

+17

bubble Nov 06 '13 at 4:25

source share

If the dictionaries are uniquely identified by all elements (the identifier is not available), you can use the response using JSON. Below is an alternative that does not use JSON and will work as long as all dictionary values are immutable.

 [dict(s) for s in set(frozenset(d.items()) for d in L)]

+14

Sina Jul 22 '16 at 8:00

source share

Here's a fairly compact solution, although I suspect that it is not particularly efficient (to put it mildly):

 >>> ds = [{'id':1,'name':'john', 'age':34}, ... {'id':1,'name':'john', 'age':34}, ... {'id':2,'name':'hanna', 'age':30} ... ] >>> map(dict, set(tuple(sorted(d.items())) for d in ds)) [{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

+13

Greg E. Jun 18 2018-12-18T00:

source share

Since id enough to detect duplicates, and id is hashed: run it through the dictionary with the id key as the key. The value for each key is the source dictionary.

 deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()

In Python 3, values() does not return a list; you need to wrap the entire right side of this expression in list() , and you can write the meat of the expression more economically as understanding the dict:

 deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())

Please note that the result will probably not be in the same order as the original. If this is a requirement, you can use Collections.OrderedDict instead of dict .

As an aside, it can make a lot of sense to just store data in a dictionary that uses the id key as a start.

+7

kindall Jun 18 '12 at 23:45

source share

 a = [ {'id':1,'name':'john', 'age':34}, {'id':1,'name':'john', 'age':34}, {'id':2,'name':'hanna', 'age':30}, ] b = {x['id']:x for x in a}.values() print(b)

outputs:

[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

+6

Yusuf X Jun 18 '12 at 23:52

source share

Extension on John La Rooy ( Python - List of Unique Dictionaries ), which makes it more flexible:

 def dedup_dict_list(list_of_dicts: list, columns: list) -> list: return list({''.join(row[column] for column in columns): row for row in list_of_dicts}.values())

Call Function:

 sorted_list_of_dicts = dedup_dict_list( unsorted_list_of_dicts, ['id', 'name'])

+3

Illegal Operator Sep 04 '17 at 16:14

source share

A quick and dirty solution is to simply create a new list.

 sortedlist = [] for item in listwhichneedssorting: if item not in sortedlist: sortedlist.append(item)

+1

lyzazel Sep 17 '16 at 23:58

source share

In Python 3.6+ (which I tested), just use:

 import json #Toy example, but will also work for your case myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}] #Start by sorting each dictionary by keys myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts] #Using json methods with set() to get unique dict myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted)))) print(myListOfUniqueDicts)

Explanation: we map json.dumps to encode dictionaries as json objects that are immutable. Then set can be used to create an iterable unique immutable variable. Finally, we convert back to our vocabulary representation using json.loads . Please note that initially you need to sort by key, so that the dictionaries are located in a unique form. This is valid for Python 3.6+, since dictionaries are ordered by default.

+1

VanillaSpinIce Oct 02 '18 at 19:47

source share

We can do with pandas

 import pandas as pd yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r') Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Please note a little different from accept the answer.

drop_duplicates will check all columns in pandas; if all are the same, the row will be deleted.

For example:

If we change the name of the second dict from John to Peter

 L=[ {'id': 1, 'name': 'john', 'age': 34}, {'id': 1, 'name': 'peter', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, ] pd.DataFrame(L).drop_duplicates().to_dict('r') Out[295]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put {'age': 30, 'id': 2, 'name': 'hanna'}]

+1

WeNYoBen Jun 07 '19 at 1:37

source share

Pretty simple option:

 L = [ {'id':1,'name':'john', 'age':34}, {'id':1,'name':'john', 'age':34}, {'id':2,'name':'hanna', 'age':30}, ] D = dict() for l in L: D[l['id']] = l output = list(D.values()) print output

0

jedwards Jun 18 2018-12-18T00:

source share

I don’t know if you want the identifier of your dicts only in the list to be unique, but if the goal is to have a dict set in which uniqueness is on the values of all keys ... you should use key tuples as follows in your understanding:

 >>> L=[ ... {'id':1,'name':'john', 'age':34}, ... {'id':1,'name':'john', 'age':34}, ... {'id':2,'name':'hanna', 'age':30}, ... {'id':2,'name':'hanna', 'age':50} ... ] >>> len(L) 4 >>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values()) >>>L [{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}] >>>len(L) 3

Hope this helps you or another person having problems ....

0

nixmind Jun 26 '18 at 17:11

source share

There are many answers here, so let me add one more:

 import json from typing import List def dedup_dicts(items: List[dict]): dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)] return dedupped items = [ {'id': 1, 'name': 'john', 'age': 34}, {'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, ] dedup_dicts(items)

0

monkut Mar 13 '19 at 13:24

source share

This is an implementation with low memory overhead due to the fact that they were not as compact as the rest.

 values = [ {'id':2,'name':'hanna', 'age':30}, {'id':1,'name':'john', 'age':34}, {'id':1,'name':'john', 'age':34}, {'id':2,'name':'hanna', 'age':30}, {'id':1,'name':'john', 'age':34},] count = {} index = 0 while index < len(values): if values[index]['id'] in count: del values[index] else: count[values[index]['id']] = 1 index += 1

exit:

 [{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

-one

Samy Vilar Jun 18 '12 at 23:52

source share

This is the solution I found:

 usedID = [] x = [ {'id':1,'name':'john', 'age':34}, {'id':1,'name':'john', 'age':34}, {'id':2,'name':'hanna', 'age':30}, ] for each in x: if each['id'] in usedID: x.remove(each) else: usedID.append(each['id']) print x

Basically you check if the identifier is in the list, if it is, delete the dictionary, if not, add the identifier to the list

-3

tabchas Jun 18 2018-12-18T00:

source share

John La Rooy · Accepted Answer · 2012-06-18 23:42

So create a temporary dict with the id key. This filters out duplicates. values() dict will be a list

In Python2.7

 >>> L=[ ... {'id':1,'name':'john', 'age':34}, ... {'id':1,'name':'john', 'age':34}, ... {'id':2,'name':'hanna', 'age':30}, ... ] >>> {v['id']:v for v in L}.values() [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python3

 >>> L=[ ... {'id':1,'name':'john', 'age':34}, ... {'id':1,'name':'john', 'age':34}, ... {'id':2,'name':'hanna', 'age':30}, ... ] >>> list({v['id']:v for v in L}.values()) [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python2.5 / 2.6

 >>> L=[ ... {'id':1,'name':'john', 'age':34}, ... {'id':1,'name':'john', 'age':34}, ... {'id':2,'name':'hanna', 'age':30}, ... ] >>> dict((v['id'],v) for v in L).values() [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Python - List of Unique Dictionaries

More articles: