Here is one vector approach with np.searchsorted based on this post -
def closest_argmin(A, B): L = B.size sidx_B = B.argsort() sorted_B = B[sidx_B] sorted_idx = np.searchsorted(sorted_B, A) sorted_idx[sorted_idx==L] = L-1 mask = (sorted_idx > 0) & \ ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) ) return sidx_B[sorted_idx-mask]
Brief explanation:
Get sorted indices for left positions. We do this using np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2) . Now searchsorted expects the sorted array to be first, so we need some preparation work.
Compare the values ββin these left positions with the values ββin their immediate correct positions (left + 1) and see which one is closest. We do this in the step that mask computes.
Depending on whether the left or their nearest right is the nearest, select the appropriate. This is done by subtracting the indices with mask values ββacting as offsets that are converted to ints .
Benchmarking
Original approach -
def org_app(myArray, refArray): out1 = np.empty(myArray.size, dtype=int) for i, value in enumerate(myArray): # find_nearest from posted question index = find_nearest(refArray, value) out1[i] = index return out1
Timing and Verification -
In [188]: refArray = np.random.random(16) ...: myArray = np.random.random(1000) ...: In [189]: %timeit org_app(myArray, refArray) 100 loops, best of 3: 1.95 ms per loop In [190]: %timeit closest_argmin(myArray, refArray) 10000 loops, best of 3: 36.6 Β΅s per loop In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray)) Out[191]: True
50x+ acceleration for the posted sample and hopefully more for large datasets!
Divakar
source share