Remove duplicates from nested dictionaries in a list

quick and very simple question for newbies.

If I have a list of dictionaries looking like this:

L = [] L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4}) 

Suppose there are several entries where value3 and value4 are identical to other nested dictionaries. How to quickly and easily find and delete these duplicate dictionaries.

Keeping order does not matter.

Thanks.

EDIT:

If there are five inputs, for example:

 L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf}, {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk}, {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}] 

The output is as follows:

 L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf}, {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld} 
+4
source share
6 answers

In Python 2.6 or 3. *:

 import itertools import pprint L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"}, {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"}, {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"}, {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"}, {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}] getvals = operator.itemgetter('value3', 'value4') L.sort(key=getvals) result = [] for k, g in itertools.groupby(L, getvals): result.append(g.next()) L[:] = result pprint.pprint(L) 

Almost the same thing in Python 2.5, except that you should use g.next () instead of the next (g) in append.

+6
source

Here is one way:

 keyfunc = lambda d: (d['value3'], d['value4']) from itertools import groupby giter = groupby(sorted(L, key=keyfunc), keyfunc) L2 = [g[1].next() for g in giter] print L2 
+7
source

You can use a temporary array to store dict elements. The previous code was tapped to remove items in a for loop.

 (v,r) = ([],[]) for i in l: if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v: r.append(i) v.extend(i.items()) l = r 

Your test:

 l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'}, {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}] 

Outputs

 {'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'} {'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'} {'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'} 
+2
source
 for dic in list: for anotherdic in list: if dic != anotherdic: if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]: list.remove(anotherdic) 

Tested with

 list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'}, {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}] 

worked fine for me :)

+1
source

This is a list of one dictionary and, assuming that there are more dictionaries in list l

:
 l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4] 

But is that what you really want to do? You may need to clarify your description.

BTW, do not use list as a name, as this is the name of the built-in Python.

EDIT: suppose you started with a list of dictionaries, not a list of lists of 1 dictionary, each of which should work with your example. This would not work if none of the values ​​were None, so something like:

 l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )] 

But it still seems an unusual data structure.

EDIT: no need to use explicit get s.

In addition, there are always trade-offs in decisions. Without additional information and without actual measurement, it is difficult to understand which performance trade-offs are most important to the problem. But, like Zen sez: "Simple is better than complex."

+1
source

If I understand correctly, you want to cancel matches that appear later in the original list, but do not care about the order of the resulting list, therefore:

(Tested with 2.5.2)

 tempDict = {} for d in L[::-1]: tempDict[(d["value3"],d["value4"])] = d L[:] = tempDict.itervalues() tempDict = None 
0
source

All Articles