Find nearest indexes for one array against all values ​​in another array - Python / NumPy

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

My current approach with numpy:

import numpy as np refArray = np.random.random(16); myArray = np.random.random(1000); def find_nearest(array, value): idx = (np.abs(array-value)).argmin() return idx; for value in np.nditer(myArray): index = find_nearest(refArray, value); print(index); 

Unfortunately, this takes time for a large number of values. Is there a faster or more "python" way of matching each value in myArray with the nearest refArray value?

FYI: I don't have to need numpy in my script.

Important: the order of both myArray and refArray is important and should not be changed. If sorting is to be applied, the original index must be somehow preserved.

+7
python arrays list numpy
source share
1 answer

Here is one vector approach with np.searchsorted based on this post -

 def closest_argmin(A, B): L = B.size sidx_B = B.argsort() sorted_B = B[sidx_B] sorted_idx = np.searchsorted(sorted_B, A) sorted_idx[sorted_idx==L] = L-1 mask = (sorted_idx > 0) & \ ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) ) return sidx_B[sorted_idx-mask] 

Brief explanation:

  • Get sorted indices for left positions. We do this using np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2) . Now searchsorted expects the sorted array to be first, so we need some preparation work.

  • Compare the values ​​in these left positions with the values ​​in their immediate correct positions (left + 1) and see which one is closest. We do this in the step that mask computes.

  • Depending on whether the left or their nearest right is the nearest, select the appropriate. This is done by subtracting the indices with mask values ​​acting as offsets that are converted to ints .

Benchmarking

Original approach -

 def org_app(myArray, refArray): out1 = np.empty(myArray.size, dtype=int) for i, value in enumerate(myArray): # find_nearest from posted question index = find_nearest(refArray, value) out1[i] = index return out1 

Timing and Verification -

 In [188]: refArray = np.random.random(16) ...: myArray = np.random.random(1000) ...: In [189]: %timeit org_app(myArray, refArray) 100 loops, best of 3: 1.95 ms per loop In [190]: %timeit closest_argmin(myArray, refArray) 10000 loops, best of 3: 36.6 Β΅s per loop In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray)) Out[191]: True 

50x+ acceleration for the posted sample and hopefully more for large datasets!

+7
source share

All Articles