The fastest way to deduct duplex in dict

I have a pointer containing lists, and you need a quick way to deduplicate lists.

I know how to dedup a list individually using the set () function, but in this case I want a quick way to iterate through a dict, listing each list along the path.

hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]} 

I would like it to look like this:

 hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]} 

Although I do not have to have the original order of the saved lists.

I tried using such a set, but this is not entirely correct (it is not iterating correctly, and I lose the first key)

 for key, value in hello.items(): goodbye = {key: set(value)} >>> goodbye {'test2': set([8, 9, 3, 4, 5])} 

EDIT . Following the PM 2Ring comment below, I now fill out the dict in different ways to avoid duplication in the first place. I used to use lists, but using sets prevents the addition of duplicates by default;

 >>> my_numbers = {} >>> my_numbers['first'] = [1,2,2,2,6,5] >>> from collections import defaultdict >>> final_list = defaultdict(set) >>> for n in my_numbers['first']: final_list['test_first'].add(n) ... >>> final_list['test_first'] set([1, 2, 5, 6]) 

As you can see, the final output is a released set, if required.

+5
source share
5 answers

This is not being repeated incorrectly, you just assign goodbye as a new dict every time. You need to assign a dict as an empty one, then assign values ​​to the keys in each iteration.

 goodbye = {} for key, value in hello.items(): goodbye[key] = set(value) >>> goodbye {'test1': set([2, 3, 4, 5, 6]), 'test2': set([8, 9, 3, 4, 5])} 

Also, since sets don't keep order, if you want to keep order, it's best to make a simple iteration function that will return a new list that skips the already added values.

 def uniqueList(li): newList = [] for x in li: if x not in newList: newList.append(x) return newList goodbye = {} for key, value in hello.items(): goodbye[key] = uniqueList(value) >>> goodbye {'test1': [2, 3, 4, 5, 6], 'test2': [5, 8, 4, 3, 9]} 
+4
source

You can use list comprehension with the deduplicate function, which keeps order:

 def deduplicate(seq): seen = set() seen_add = seen.add return [ x for x in seq if not (x in seen or seen_add(x))] {key: deduplicate(value) for key, value in hello.items()} 
+5
source
 >>>hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]} >>>for key,value in hello.iteritems(): hello[key] = list(set(value)) >>>hello {'test1': [2, 3, 4, 5, 6], 'test2': [8, 9, 3, 4, 5]} 
+3
source

This is a more reliable way to do this, which is orderly and works in all versions of Python:

 for key in hello: s = set() l = [] for subval in hello[key]: if subval not in s: l.append(subval) s.add(subval) hello[key] = l 
0
source
 my_list = [1,2,2,2,3,4,5,6,7,7,7,7,7,8,9,10] seen = set() print list(filter(lambda x:x not in seen and not seen.add(x),my_list)) 
0
source

All Articles