Remove duplicates from nested dictionaries in a list

Question

Remove duplicates from nested dictionaries in a list

quick and very simple question for newbies.

If I have a list of dictionaries looking like this:

L = [] L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4})

Suppose there are several entries where value3 and value4 are identical to other nested dictionaries. How to quickly and easily find and delete these duplicate dictionaries.

Keeping order does not matter.

Thanks.

EDIT:

If there are five inputs, for example:

 L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf}, {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk}, {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}]

The output is as follows:

 L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk}, {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf}, {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}

+4

python dictionary

Jonas Aug 14 '09 at 19:41

source share

6 answers

Here is one way:

 keyfunc = lambda d: (d['value3'], d['value4']) from itertools import groupby giter = groupby(sorted(L, key=keyfunc), keyfunc) L2 = [g[1].next() for g in giter] print L2

+7

ars Aug 14 '09 at 21:03

source share

You can use a temporary array to store dict elements. The previous code was tapped to remove items in a for loop.

 (v,r) = ([],[]) for i in l: if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v: r.append(i) v.extend(i.items()) l = r

Your test:

 l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'}, {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

Outputs

 {'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'} {'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'} {'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'}

+2

Acoolie Aug 14 '09 at 20:06

source share

 for dic in list: for anotherdic in list: if dic != anotherdic: if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]: list.remove(anotherdic)

Tested with

 list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'}, {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'}, {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

worked fine for me :)

+1

wallacer Aug 14 '09 at 19:55

source share

This is a list of one dictionary and, assuming that there are more dictionaries in list l

:

 l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4]

But is that what you really want to do? You may need to clarify your description.

BTW, do not use list as a name, as this is the name of the built-in Python.

EDIT: suppose you started with a list of dictionaries, not a list of lists of 1 dictionary, each of which should work with your example. This would not work if none of the values were None, so something like:

 l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )]

But it still seems an unusual data structure.

EDIT: no need to use explicit get s.

In addition, there are always trade-offs in decisions. Without additional information and without actual measurement, it is difficult to understand which performance trade-offs are most important to the problem. But, like Zen sez: "Simple is better than complex."

+1

Ned deily Aug 14 '09 at 20:05

source share

If I understand correctly, you want to cancel matches that appear later in the original list, but do not care about the order of the resulting list, therefore:

(Tested with 2.5.2)

 tempDict = {} for d in L[::-1]: tempDict[(d["value3"],d["value4"])] = d L[:] = tempDict.itervalues() tempDict = None

0

Anon Aug 14 '09 at 21:40

source share

Alex martelli · Accepted Answer · 2009-08-14T22:15:18+0000

In Python 2.6 or 3. *:

 import itertools import pprint L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"}, {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"}, {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"}, {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"}, {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}] getvals = operator.itemgetter('value3', 'value4') L.sort(key=getvals) result = [] for k, g in itertools.groupby(L, getvals): result.append(g.next()) L[:] = result pprint.pprint(L)

Almost the same thing in Python 2.5, except that you should use g.next () instead of the next (g) in append.

Remove duplicates from nested dictionaries in a list

More articles: