Python Merge Dictionaries with Custom Merge Function

Question

Python Merge Dictionaries with Custom Merge Function

I want to combine two dictionaries A and B so that the result contains:

All pairs from A, where the key is unique to A
All pairs from B, where the key is unique to B
f (valueA, valueB), where the same key exists in both A and B

For example:

def f(x, y): return x * y A = {1:1, 2:3} B = {7:3, 2:2} C = merge(A, B)

Output:

 {1:1, 7:3, 2:6}

It seems like there should be a good one-liner for this.

+4

python dictionary coding-style

Stefan May 15 '13 at 9:00 a.m.

source share

8 answers

Theft of this ( A.get(k, B.get(k)) snippet from @MartijnPieters

 >>> def f(x, y): return x * y >>> A = {1:1, 2:3} >>> B = {7:3, 2:2} >>> {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k)) for k in A.viewkeys() | B.viewkeys()} {1: 1, 2: 6, 7: 3}

+1

jamylak May 15 '13 at 9:18

source share

Here is my solution code in Python 3 for the general case.

First I wrote a merge function, and then expanded it to a more general merge_with function, which takes a function and a different number of dictionaries. If there are duplicate keys in these dictionaries, apply the provided function to values whose keys are duplicated.

The merge function can be overridden using the merge_with function, as in the case of the merge function. Merging names means merging them all and preserving the rightmost values, if there are duplicates. The merge function, which preserves the leftmost, does the same thing.

All functions here - merge, merge_with, mergel and merger - are common if they accept an arbitrary number of dictionary arguments. In particular, merge_with must take as an argument a function compatible with the data to which it will be applied.

 from functools import reduce from operator import or_ def merge(*dicts): return { k: reduce(lambda d, x: x.get(k, d), dicts, None) for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) } def merge_with(f, *dicts): return { k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in dicts if k in d ]) for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) } mergel = lambda *dicts: merge_with(lambda *x: x[0], *dicts) merger = lambda *dicts: merge_with(lambda *x: x[-1], *dicts)

Test

 >>> squares = { k:k*k for k in range(4) } >>> squares {0: 0, 1: 1, 2: 4, 3: 9} >>> cubes = { k:k**3 for k in range(2,6) } >>> cubes {2: 8, 3: 27, 4: 64, 5: 125} >>> merger(squares, cubes) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merger(cubes, squares) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> mergel(squares, cubes) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> mergel(cubes, squares) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merge(squares, cubes) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merge(cubes, squares) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> merge_with(lambda x, y: x+y, squares, cubes) {0: 0, 1: 1, 2: 12, 3: 36, 4: 64, 5: 125} >>> merge_with(lambda x, y: x*y, squares, cubes) {0: 0, 1: 1, 2: 32, 3: 243, 4: 64, 5: 125}

Update

After I wrote above, I found there another way to do this.

 from functools import reduce def merge(*dicts): return reduce(lambda d1, d2: reduce(lambda d, t: dict(list(d.items())+[t]), d2.items(), d1), dicts, {}) def merge_with(f, *dicts): return reduce(lambda d1, d2: reduce(lambda d, t: dict(list(d.items()) + [(t[0], f(d[t[0]], t[1]) if t[0] in d else t[1])]), d2.items(), d1), dicts, {}) mergel = lambda *dicts: merge_with(lambda x, y: x, *dicts) merger = lambda *dicts: merge_with(lambda x, y: y, *dicts)

Note that merge and merge definitions using merge_with have been changed with the new functions as the first arguments. Function f should now be binary. The above tests still work. Here are some more tests to show the generality of these functions.

 >>> merge() == {} True >>> merge(squares) == squares True >>> merge(cubes) == cubes True >>> mergel() == {} True >>> mergel(squares) == squares True >>> mergel(cubes) == cubes True >>> merger() == {} True >>> merger(squares) == squares True >>> merger(cubes) == cubes True >>> merge_with(lambda x, y: x+y, squares, cubes, squares) {0: 0, 1: 2, 2: 16, 3: 45, 4: 64, 5: 125} >>> merge_with(lambda x, y: x*y, squares, cubes, squares) {0: 0, 1: 1, 2: 128, 3: 2187, 4: 64, 5: 125}

+1

Lei Zhao Feb 07 '15 at 11:20

source share

 >>> def f(x,y): ... return x*y ... >>> dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()]) {1: 1, 2: 6, 7: 3}

0

DhruvPathak May 15 '13 at 9:13

source share

 from itertools import chain intersection = set(A.keys()).intersection(B.keys()) C = dict(chain(A.items(), B.items(), ((k, f(A[k], B[k])) for k in intersection)))

Can be technically done in oneliner. Works in both Py2 and Py3. If you only care about Py3, you can rewrite the string "intersection" to:

 intersection = A.keys() & B.keys()

(only for Py2-only use A.viewkeys() & B.viewkeys() .)

0

kampu May 15 '13 at 9:25

source share

 dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()])

In my opinion, this is the shortest and most readable code in Python 3. I got it from the DhruvPathak answer and realized that its optimization leads to the kampu answer specialized for Python 3:

 dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys())))

I compared all the answers here by efficiency and got this rating:

mergeLZ: 34.0ms (Lei Zhao, rather bulky, with one liner)
mergeJK: 11.6ms (jamilak)
mergeMP: 11.5ms (Martijn Pieters, almost one mergeMP: 11.5ms )
mergeDP: 6.9ms (DhruvPathak)
mergeDS: 6.8ms (1- mergeDS: 6.8ms above)
mergeK3: 5.2ms (kampu = 2nd single line above)
mergeS3: 3.5ms (required, not single line)

where the last mergeS3 is naive, mandatory, multi-line code. I am disappointed that old ways prevail when it comes to performance. This test is for simple integer keys and values, but the ranking is very similar to large string keys and values. Obviously, the mileage may vary depending on the size of the dictionary and the number of key matches (1/3 in my test). By the way, the second implementation of Lei Zhao, which I did not try to understand, seems to have terrible performance, about 1000 times slower.

The code:

 import functools import itertools import operator import timeit def t(x): # transform keys and values return x # str(x) * 8 def f(x,y): # merge values return x + y N = 10000 A = {t(k*2): t(k*22) for k in range(N)} B = {t(k*3): t(k*33) for k in range(N)} def check(AB): assert(len(A) == N) assert(len(B) == N) assert(len(AB) == 16666) assert(AB[t(0)] == f(t(0), t(0))) assert(t(1) not in AB) assert(AB[t(2)] == t(1*22)) assert(AB[t(3)] == t(1*33)) assert(AB[t(4)] == t(2*22)) assert(t(5) not in AB) assert(AB[t(6)] == f(t(3*22), t(2*33))) assert(t(7) not in AB) assert(AB[t(8)] == t(4*22)) assert(AB[t(9)] == t(3*33)) def mergeLZ(): # Lei Zhao merged = {k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in [A, B] if k in d ]) for k in functools.reduce(operator.or_, map(lambda x: x.keys(), [A, B]), set()) } check(merged) def mergeJK(): # jamylak merged = {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k)) for k in A.keys() | B.keys()} check(merged) def mergeMP(): # Martijn Pieters merged = {k: A.get(k, B.get(k)) for k in A.keys() ^ B.keys()} merged.update({k: f(A[k], B[k]) for k in A.keys() & B.keys()}) check(merged) def mergeDP(): # DhruvPathak merged = dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()]) check(merged) def mergeDS(): # more elegant (IMO) variation on DhruvPathak merged = dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()]) check(merged) def mergeK3(): # kampu adapted to Python 3 merged = dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys()))) check(merged) def mergeS3(): # "naive" imperative way merged = A.copy() for k,v in B.items(): if k in A: merged[k] = f(A[k], v) else: merged[k] = v check(merged) for m in [mergeLZ, mergeJK, mergeMP, mergeDP, mergeDS, mergeK3, mergeS3]: print("{}: {:4.1f}ms".format(m.__name__, timeit.timeit(m, number=1000)))

0

Stein Nov 24 '16 at 15:22

source share

Another approach that (imho) is more readable for users who come from the background in functional programming

 def merge_with(f): def merge(a,b): g = lambda l: [x for x in l if x is not None] keys = a.keys() | b.keys() return {key:f(*g([a.get(key), b.get(key)])) for key in keys} return merge

Application of this OP example:

 A = {1:1, 2:3} B = {7:3, 2:2} merge_with(lambda x,y=1: x*y)(A,B)

0

Sascha Apr 11 '18 at 9:27

source share

 def merge_dict(dict1,dict2): dict1={1:'red'} dict2={2:'black',3:'yellow'} dict1.update(dict2) print 'dict3 =',dict1 merge_dict(dict1,dict2)

Output:

 dict3 = {1: 'red', 2: 'black', 3: 'yellow'}

-one

dinesh Oct 31 '14 at 2:23

source share

Martijn Pieters · Accepted Answer · 2013-05-15 09:08

Use vocabulary to achieve this; the result of dict.viewkeys() acts as a set and allows for intersections and symmetric differences:

 def merge(A, B, f): # Start with symmetric difference; keys either in A or B, but not both merged = {k: A.get(k, B.get(k)) for k in A.viewkeys() ^ B.viewkeys()} # Update with `f()` applied to the intersection merged.update({k: f(A[k], B[k]) for k in A.viewkeys() & B.viewkeys()}) return merged

In Python 3, the .viewkeys() method was renamed to .keys() , replacing the old .keys() functionality (which returns a list in Python 2).

The above merge() method is a generic solution that works for any given f() .

Demo:

 >>> def f(x, y): ... return x * y ... >>> A = {1:1, 2:3} >>> B = {7:3, 2:2} >>> merge(A, B, f) {1: 1, 2: 6, 7: 3} >>> merge(A, B, lambda a, b: '{} merged with {}'.format(a, b)) {1: 1, 2: '3 merged with 2', 7: 3}

Python Merge Dictionaries with Custom Merge Function

More articles: