How to compare between two numpy arrays of different sizes and return an index column with common elements?

For obvious reasons, I have two numpy arrays with different sizes with an index, as well as xyz coordinates, and the rest just contain coordinates. (please ignore the first serial number, I cannot understand the formatting.) The second array does not have less. coordinates and I need the indices (atomID) of these coordinates from the first array.

Array1 (with index column):

serialNo. moleculeID atomID xyz 
  • 1 1 2 0 7.7590151 7.2925348 12.5933323
  • 2 1 2 0 7.123642 6.1970949 11.5622416
  • 3 1 6 0 6.944543 7.0390449 12.0713224
  • 4 1 2 0 8.8900348 11.5477333 13.5633965
  • 5 1 2 0 7.857268 12.8062735 13.4357052
  • 6 1 6 0 8.2124357 12.1004238 14.0486889

Array2 (only coordinates):

 xyz 
  • 7.7590151 7.2925348 12.5933323
  • 7.123642 6.1970949 11.5622416
  • 6.944543 7.0390449 12.0713224
  • 8.8900348 11.5477333 13.5633965

In the array with the index column (atomID) the indices are indicated as 2, 2, 6, 2, 2 and 6. How can I get the indices for coordinates that are common to Array1 and Array2. I expect to return 2 2 6 2 to the list and then combine it with the second array. Any simple ideas?

Update:

Tried using the following code, but it doesn't seem to work.

 import numpy as np a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]]) b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]]) print a print b for i in range(len(b)): for j in range(len(a)): if a[j,1]==b[i,0]: x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis) #continue else: print 'not true' print x 

which outputs the following:

 not true not true not true not true not true not true not true not true not true [[ 3. 2.2 5. ] [ 3. -6.3 0. ] [ 3. 3.6 8. ]] 

but expected:

  [[ 4. 2.2 5. ] [ 2. -6.3 0. ] [ 3. 3.6 8. ]] 
+2
python arrays numpy
source share
4 answers

The numpy_indexed package (disclaimer: I am the author of it) contains the functionality to solve such problems in an elegant and efficient / vectorized way:

 import numpy_indexed as npi print(a[npi.contains(b, a[:, 1:])]) 

The current accepted answer calls me wrong for points that differ in the last coordinates. And the performance here should be greatly improved; not only is this solution vectorized, but the worst performance is NlogN, not the quadratic time complexity of the currently accepted answer.

+2
source share

Two brief vectorized ways to do this with cdist -

 from scipy.spatial.distance import cdist out = a[np.any(cdist(a[:,1:],b)==0,axis=1)] 

Or, if you don't mind getting a little voodoo-ish, here np.einsum replace np.any -

 out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)] 

Run Example -

 In [15]: from scipy.spatial.distance import cdist In [16]: a Out[16]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ], [ 5. , -9.8, 50. ]]) In [17]: b Out[17]: array([[ 2.2, 5. ], [-6.3, 0. ], [ 3.6, 8. ]]) In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)] Out[18]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ]]) In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)] Out[19]: array([[ 4. , 2.2, 5. ], [ 2. , -6.3, 0. ], [ 3. , 3.6, 8. ]]) 
+2
source share

This is just pseudo code for your question:

 import numpy as np for i in range(len(array2)): for element in array1: if array2[i]xyz == elementxyz: #compare the coordinates of the two elements np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array break 
+1
source share

Using a list instead of an array for np.insert values ​​did the trick.

 import numpy as np a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]]) b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]]) print a print b x = [] for i in range(len(b)): for j in range(len(a)): if a[j,1]==b[i,0]: x.append(a[j,0]) else: x = x print np.insert(b,0,x,axis=1) 

which outputs:

 [[ 4. 2.2 5. ] [ 2. -6.3 0. ] [ 3. 3.6 8. ]] 
0
source share

All Articles