How to make the values ​​in the dictionary list unique?

I have a list of dictionaries in Python that looks like this:

d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...] 

I want to achieve that to keep unique feature_a , _b and _c .

For example, if we have 3 entries that have the same feature_a and _b , but have 3 different values ​​for feature_c 100 , 100 , 150 , then after the operation it should be 100 and 150 .

How can i achieve this?

==================================================== =============== UPDATE:

Ok, thanks for Anand’s excellent answer, it works great. However, I have one more question.

Suppose we have a new feature_d , and the dictionary looks like this:

 d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...] 

and I only want to deduplicate feature_a , _b and _c , but leave feature_d out. How can I achieve this?

Thank you very much.

+3
python unique
Aug 03 '15 at 16:49
source share
1 answer

If the order of the original list d not important, you can take the .items() each dictionary and convert it to frozenset() , which is hashed, and then you can convert it all to set() or frozenset() , and then convert each frozenset() back to the dictionary. Example -

 uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d))) 

sets() do not allow duplicate elements. Although you will eventually lose the order of the list. For Python 2.x, list(...) not required since map() returns a list.




Example / Demo -

 >>> import pprint >>> pprint.pprint(d) [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100}, {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}, {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110}, {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100}, {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}] >>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d))) >>> pprint.pprint(uniq_d) [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100}, {'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}, {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110}, {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}] 



For a new requirement -

However, if I have another feature_d, but I only want to defragment feature_a, _b and _c

If two entries that have the same functions_a, _b and _c, they are considered the same and duplicated, regardless of what is in feature_d

An easy way to do this is to use a set and a new list, add only the functions that you need to dial, and check only those functions that you need. Example -

 seen_set = set() new_d = [] for i in d: if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set: new_d.append(i) seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']])) 

Example / Demo -

 >>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'}, ... {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'}, ... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'}, ... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}] >>> seen_set = set() >>> new_d = [] >>> for i in d: ... if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set: ... new_d.append(i) ... seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']])) ... >>> pprint.pprint(new_d) [{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'}, {'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'}, {'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}] 
+3
Aug 03 '15 at 17:16
source share



All Articles