Find nearest indexes for one array against all values in another array - Python / NumPy

Question

Find nearest indexes for one array against all values in another array - Python / NumPy

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

My current approach with numpy:

import numpy as np refArray = np.random.random(16); myArray = np.random.random(1000); def find_nearest(array, value): idx = (np.abs(array-value)).argmin() return idx; for value in np.nditer(myArray): index = find_nearest(refArray, value); print(index);

Unfortunately, this takes time for a large number of values. Is there a faster or more "python" way of matching each value in myArray with the nearest refArray value?

FYI: I don't have to need numpy in my script.

Important: the order of both myArray and refArray is important and should not be changed. If sorting is to be applied, the original index must be somehow preserved.

+7

python arrays list numpy

Alexander Jul 27 '17 at 11:30

source share

1 answer

Divakar · Accepted Answer · 2017-07-27T12:05:15+0000

Here is one vector approach with np.searchsorted based on this post -

 def closest_argmin(A, B): L = B.size sidx_B = B.argsort() sorted_B = B[sidx_B] sorted_idx = np.searchsorted(sorted_B, A) sorted_idx[sorted_idx==L] = L-1 mask = (sorted_idx > 0) & \ ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) ) return sidx_B[sorted_idx-mask]

Brief explanation:

Get sorted indices for left positions. We do this using np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2) . Now searchsorted expects the sorted array to be first, so we need some preparation work.
Compare the values in these left positions with the values in their immediate correct positions (left + 1) and see which one is closest. We do this in the step that mask computes.
Depending on whether the left or their nearest right is the nearest, select the appropriate. This is done by subtracting the indices with mask values acting as offsets that are converted to ints .

Benchmarking

Original approach -

 def org_app(myArray, refArray): out1 = np.empty(myArray.size, dtype=int) for i, value in enumerate(myArray): # find_nearest from posted question index = find_nearest(refArray, value) out1[i] = index return out1

Timing and Verification -

 In [188]: refArray = np.random.random(16) ...: myArray = np.random.random(1000) ...: In [189]: %timeit org_app(myArray, refArray) 100 loops, best of 3: 1.95 ms per loop In [190]: %timeit closest_argmin(myArray, refArray) 10000 loops, best of 3: 36.6 µs per loop In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray)) Out[191]: True

50x+ acceleration for the posted sample and hopefully more for large datasets!

Find nearest indexes for one array against all values ​​in another array - Python / NumPy

More articles:

Find nearest indexes for one array against all values in another array - Python / NumPy