In Python, how do I join two arrays on key columns?

Suppose I have two arrays (after importing numpy as np),

a=np.array([['a',1],['b',2]],dtype=object) 

and

 b=np.array([['b',3],['c',4]],dtype=object) 

How do I get:

 c=np.array([['a',1,None],['b',2,3],['c',None,4]],dtype=object) 

Basically, joining using the first column as a key.

thanks

+4
source share
2 answers

A pure Python approach for this would be

 da = dict(a) db = dict(b) c = np.array([(k, da.get(k), db.get(k)) for k in set(da.iterkeys()).union(db.iterkeys())]) 

But if you use NumPy, your arrays are probably large, and you are looking for a solution with better performance. In this case, I suggest using some real database for this, e.g. sqlite3 module that comes with Python.

+5
source

The best solution I found is to use pandas, which handles connections very well, and pandas objects are easily converted to / from numpy arrays.

+2
source

All Articles