EDIT 2 A solution using KDTree can work very well if you can select multiple neighbors that guarantee that you will have a unique neighbor for each element in your array. With the following code:
def nearest_neighbors_kd_tree(x, y, k) : x, y = map(np.asarray, (x, y)) tree =scipy.spatial.cKDTree(y[:, None]) ordered_neighbors = tree.query(x[:, None], k)[1] nearest_neighbor = np.empty((len(x),), dtype=np.intp) nearest_neighbor.fill(-1) used_y = set() for j, neigh_j in enumerate(ordered_neighbors) : for k in neigh_j : if k not in used_y : nearest_neighbor[j] = k used_y.add(k) break return nearest_neighbor
and a sample of points n=1000 , I get:
In [9]: np.any(nearest_neighbors_kd_tree(x, y, 12) == -1) Out[9]: True In [10]: np.any(nearest_neighbors_kd_tree(x, y, 13) == -1) Out[10]: False
Thus, k=13 is optimal, and then time:
In [11]: %timeit nearest_neighbors_kd_tree(x, y, 13) 100 loops, best of 3: 9.26 ms per loop
But in the worst case, you may need k=1000 , and then:
In [12]: %timeit nearest_neighbors_kd_tree(x, y, 1000) 1 loops, best of 3: 424 ms per loop
Which is slower than other options:
In [13]: %timeit nearest_neighbors(x, y) 10 loops, best of 3: 60 ms per loop In [14]: %timeit nearest_neighbors_sorted(x, y) 10 loops, best of 3: 47.4 ms per loop
EDIT . Array sorting before searching is calculated for arrays of more than 1000 elements:
def nearest_neighbors_sorted(x, y) : x, y = map(np.asarray, (x, y)) y_idx = np.argsort(y) y = y[y_idx] nearest_neighbor = np.empty((len(x),), dtype=np.intp) for j, xj in enumerate(x) : idx = np.searchsorted(y, xj) if idx == len(y) or idx != 0 and y[idx] - xj > xj - y[idx-1] : idx -= 1 nearest_neighbor[j] = y_idx[idx] y = np.delete(y, idx) y_idx = np.delete(y_idx, idx) return nearest_neighbor
With an array of 10,000 elements long:
In [2]: %timeit nearest_neighbors_sorted(x, y) 1 loops, best of 3: 557 ms per loop In [3]: %timeit nearest_neighbors(x, y) 1 loops, best of 3: 1.53 s per loop
For smaller arrays, it is slightly worse.
You will need to iterate over all of your objects in order to implement the greedy algorithm of the nearest neighbor, if only to discard duplicates. With that in mind, this is the fastest I could come up with:
def nearest_neighbors(x, y) : x, y = map(np.asarray, (x, y)) y = y.copy() y_idx = np.arange(len(y)) nearest_neighbor = np.empty((len(x),), dtype=np.intp) for j, xj in enumerate(x) : idx = np.argmin(np.abs(y - xj)) nearest_neighbor[j] = y_idx[idx] y = np.delete(y, idx) y_idx = np.delete(y_idx, idx) return nearest_neighbor
And now with:
n = 1000 x = np.random.rand(n) y = np.random.rand(2*n)
I get:
In [11]: %timeit nearest_neighbors(x, y) 10 loops, best of 3: 52.4 ms per loop