How to compare between two numpy arrays of different sizes and return an index column with common elements?

Question

How to compare between two numpy arrays of different sizes and return an index column with common elements?

For obvious reasons, I have two numpy arrays with different sizes with an index, as well as xyz coordinates, and the rest just contain coordinates. (please ignore the first serial number, I cannot understand the formatting.) The second array does not have less. coordinates and I need the indices (atomID) of these coordinates from the first array.

Array1 (with index column):

serialNo. moleculeID atomID xyz

1 1 2 0 7.7590151 7.2925348 12.5933323
2 1 2 0 7.123642 6.1970949 11.5622416
3 1 6 0 6.944543 7.0390449 12.0713224
4 1 2 0 8.8900348 11.5477333 13.5633965
5 1 2 0 7.857268 12.8062735 13.4357052
6 1 6 0 8.2124357 12.1004238 14.0486889

Array2 (only coordinates):

xyz

7.7590151 7.2925348 12.5933323
7.123642 6.1970949 11.5622416
6.944543 7.0390449 12.0713224
8.8900348 11.5477333 13.5633965

In the array with the index column (atomID) the indices are indicated as 2, 2, 6, 2, 2 and 6. How can I get the indices for coordinates that are common to Array1 and Array2. I expect to return 2 2 6 2 to the list and then combine it with the second array. Any simple ideas?

Update:

Tried using the following code, but it doesn't seem to work.

 import numpy as np a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]]) b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]]) print a print b for i in range(len(b)): for j in range(len(a)): if a[j,1]==b[i,0]: x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis) #continue else: print 'not true' print x

which outputs the following:

 not true not true not true not true not true not true not true not true not true [[ 3. 2.2 5. ] [ 3. -6.3 0. ] [ 3. 3.6 8. ]]

but expected:

  [[ 4. 2.2 5. ] [ 2. -6.3 0. ] [ 3. 3.6 8. ]]

+2

python arrays numpy

Rafat Aug 4 '15 at 19:52

source share

4 answers

Two brief vectorized ways to do this with cdist -

 from scipy.spatial.distance import cdist out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]

Or, if you don't mind getting a little voodoo-ish, here np.einsum replace np.any -

 out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]

Run Example -

 In [15]: from scipy.spatial.distance import cdist In [16]: a Out[16]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ], [ 5. , -9.8, 50. ]]) In [17]: b Out[17]: array([[ 2.2, 5. ], [-6.3, 0. ], [ 3.6, 8. ]]) In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)] Out[18]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ]]) In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)] Out[19]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ]])

+2

Divakar Aug 05 '15 at 5:43

source share

This is just pseudo code for your question:

 import numpy as np for i in range(len(array2)): for element in array1: if array2[i]xyz == elementxyz: #compare the coordinates of the two elements np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array break

+1

Rafael rios Aug 4 '15 at 20:09

source share

Using a list instead of an array for np.insert values did the trick.

 import numpy as np a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]]) b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]]) print a print b x = [] for i in range(len(b)): for j in range(len(a)): if a[j,1]==b[i,0]: x.append(a[j,0]) else: x = x print np.insert(b,0,x,axis=1)

which outputs:

 [[ 4. 2.2 5. ] [ 2. -6.3 0. ] [ 3. 3.6 8. ]]

0

Rafat Aug 4 '15 at 10:51

source share

Eelco hoogendoorn · Accepted Answer · 2016-04-29T13:33:17+0000

The numpy_indexed package (disclaimer: I am the author of it) contains the functionality to solve such problems in an elegant and efficient / vectorized way:

 import numpy_indexed as npi print(a[npi.contains(b, a[:, 1:])])

The current accepted answer calls me wrong for points that differ in the last coordinates. And the performance here should be greatly improved; not only is this solution vectorized, but the worst performance is NlogN, not the quadratic time complexity of the currently accepted answer.

How to compare between two numpy arrays of different sizes and return an index column with common elements?

More articles: