You can use np.in1d to find those elements of a that are in b . To find the index, use one call to np.where :
In [34]: a = array([1,2,3,4,5]) In [35]: b = array([2,4,7]) In [36]: np.in1d(a, b) Out[38]: array([False, True, False, True, False], dtype=bool) In [39]: np.where(np.in1d(a, b)) Out[39]: (array([1, 3]),)
Since a and b already sorted, you can use
In [57]: np.searchsorted(b, a, side='right') != np.searchsorted(b, a, side='left') Out[57]: array([False, True, False, True, False], dtype=bool)
instead of np.in1d(a, b) . For large a and b using searchsorted can be faster:
import numpy as np a = np.random.choice(10**7, size=10**6, replace=False) a.sort() b = np.random.choice(10**7, size=10**5, replace=False) b.sort() In [53]: %timeit np.in1d(a, b) 10 loops, best of 3: 176 ms per loop In [54]: %timeit np.searchsorted(b, a, side='right') != np.searchsorted(b, a, side='left') 10 loops, best of 3: 106 ms per loop
Jaime and Divakar have proposed some significant improvements regarding the method shown above. Here is some code that checks that all methods return the same result, and then some control values:
import numpy as np a = np.random.choice(10**7, size=10**6, replace=False) a.sort() b = np.random.choice(10**7, size=10**5, replace=False) b.sort() def using_searchsorted(a, b): return (np.where(np.searchsorted(b, a, side='right') != np.searchsorted(b, a, side='left')))[0] def using_in1d(a, b): return np.where(np.in1d(a, b))[0] def using_searchsorted_divakar(a, b): idx1 = np.searchsorted(a,b,'left') idx2 = np.searchsorted(a,b,'right') out = idx1[idx1 != idx2] return out def using_jaime_mask(haystack, needle): idx = np.searchsorted(haystack, needle) mask = idx < haystack.size mask[mask] = haystack[idx[mask]] == needle[mask] idx = idx[mask] return idx expected = using_searchsorted(a, b) for func in (using_in1d, using_searchsorted_divakar, using_jaime_mask): result = func(a, b) assert np.allclose(expected, result)
In [29]: %timeit using_jaime_mask(a, b) 100 loops, best of 3: 13 ms per loop In [28]: %timeit using_searchsorted_divakar(a, b) 10 loops, best of 3: 21.7 ms per loop In [26]: %timeit using_searchsorted(a, b) 10 loops, best of 3: 109 ms per loop In [27]: %timeit using_in1d(a, b) 10 loops, best of 3: 173 ms per loop