Efficiently check Euclidean distance for a large number of objects in Python

In the route planning algorithm, I am trying to filter in a list of nodes based on the distance to another node. I actually pull lists from the rough schedule of the scene. I use the term “cell” to refer to a volume in a simple scenario, from which we deduced a list of nodes that are close to each other.

Now I am implementing this as:

# SSCCE version of the core function def nodes_in_range(src, cell, maxDist): srcX, srcY, srcZ = src.x, src.y, src.z maxDistSq = maxDist ** 2 for node in cell: distSq = (node.x - srcX) ** 2 if distSq > maxDistSq: continue distSq += (node.y - srcY) ** 2 if distSq > maxDistSq: continue distSq += (node.z - srcZ) ** 2 if distSq <= maxDistSq: yield node, distSq ** 0.5 # fast sqrt from collections import namedtuple class Node(namedtuple('Node', ('ID', 'x', 'y', 'z'))): # actual class has assorted other properties pass # 1, 3 and 9 are <= 4.2 from Node(1) cell = [ Node(1, 0, 0, 0), Node(2, -2, -3, 4), Node(3, .1, .2, .3), Node(4, 2.3, -3.3, -4.5), Node(5, -2.5, 4.5, 5), Node(6, 4, 3., 2.), Node(7, -2.46, 2.46, -2.47), Node(8, 2.45, -2.46, -2.47), Node(9, .5, .5, .1), Node(10, 5, 6, 7), # In practice, cells have upto 600 entries ] if __name__ == "__main__": for node, dist in nodes_in_range(cell[0], cell, 4.2): print("{:3n} {:5.2f}".format(node.ID, dist)) 

This procedure is called a lot (10 ^ 7 + times in some queries), so every bit of punching matters and avoids the elementary search with conventions that really helped.

What I'm trying to do is switch to numpy and arrange the cells so that I can vectorize. I want to achieve this:

 import numpy import numpy.linalg contarry = numpy.ascontiguousarray float32 = numpy.float32 # The "np_cell" has two arrays: one is the list of nodes and the # second is a vectorizable array of their positions. # np_cell[N][1] == numpy array position of np_cell[N][0] def make_np_cell(cell): return ( cell, contarry([contarry((node.x, node.y, node.z), float32) for node in cell]), ) # This version fails because norm returns a single value. def np_nodes_in_range1(srcPos, np_cell, maxDist): distances = numpy.linalg.norm(np_cell[1] - srcPos) for (node, dist) in zip(np_cell[0], distances): if dist <= maxDist: yield node, dist # This version fails because def np_nodes_in_range2(srcPos, np_cell, maxDist): # this will fail because the distances are wrong distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1) for (node, dist) in zip(np_cell[0], distances): if dist <= maxDist: yield node, dist # This version doesn't vectorize and so performs poorly def np_nodes_in_range3(srcPos, np_cell, maxDist): norm = numpy.linalg.norm for (node, pos) in zip(np_cell[0], np_cell[1]): dist = norm(srcPos - pos) if dist <= maxDist: yield node, dist if __name__ == "__main__": np_cell = make_np_cell(cell) srcPos = np_cell[1][0] # Position column [1], first node [0] print("v1 - fails because it gets a single distance") try: for node, dist in np_nodes_in_range1(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist)) except TypeError: print("distances was a single value") print("v2 - gets the wrong distance values") for node, dist in np_nodes_in_range2(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist)) print("v3 - slower") for node, dist in np_nodes_in_range3(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist)) 

Combined integer here - I included v4, which tries to use enumerate instead of zip and finds that it is about 12us slower.

Output Example:

  1 0.00 3 0.37 9 0.71 v1 - fails because it gets a single distance distances was a single value v2 - gets the wrong distance values 1 0.00 3 0.60 9 1.10 v3 - slower 1 0.00 3 0.37 9 0.71 v4 - v2 using enumerate 1 0.00 3 0.60 9 1.10 

In terms of performance, we can verify this with timeit . I am increasing the number of nodes in a cell with simple multiplication:

 In [2]: from sscce import * In [3]: cell = cell * 32 # increase to 320 nodes In [4]: len(cell) Out[4]: 320 In [5]: %timeit -n 1000 -r 7 sum(1 for _ in nodes_in_range(cell[0], cell, 4.2)) 1000 loops, best of 7: 742 µs per loop In [6]: np_cell = make_np_cell(cell) In [7]: srcPos = np_cell[1][0] In [8]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2))) 1000 loops, best of 7: 136 µs per loop In [9]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range3(srcPos, np_cell, numpy.float32(4.2))) 1000 loops, best of 7: 3.64 ms per loop 

Main characteristics:

 nodes_in_range 1000 loops, best of 7: 742 µs per loop np_nodes_in_range2 1000 loops, best of 7: 136 µs per loop np_nodes_in_range3 1000 loops, best of 7: 3.64 ms per loop # OUCH 

Questions:

  • What am I doing wrong with calculating vectorized distances?

     distances = numpy.linalg.norm(np_cell[1] - srcPos) 

    against

     distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1) 
  • Is this a better approach?

  • The number of cells varies between several nodes and hundreds. I am currently sorting through the cells, but it seems likely that I want the marshal of a full set of candidates (nodes[], positions[]) , despite the potential additional cost of creating a list for this (I could always use a batch drive so that I always tried and filled the battery with, say, at least 1024 positions before the drain). But I think that this thinking is formed through the use of a continuous array. Should I look at something like:

     nodes_in_range(src, chain(cell.nodes for cell in scene if cell_in_range(boundingBox))) 

and not worry about trying to smooth it all out?

+7
python vectorization numpy computational-geometry
source share
1 answer
  • What am I doing wrong with calculating vectorized distances?

     distances = numpy.linalg.norm(np_cell[1] - srcPos) 

    against

     distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1) 

Firstly, if axis=None , np.linalg.norm will calculate either the vector norm (if the input is 1D) or the matrix norm (if the input is multi-dimensional). Both are scalar quantities.

Secondly, ord=1 means the L1 norm (i.e. Manhattan distance ), not the Euclidean distance, as you mentioned in the name.


  1. Is this a better approach?

A kD tree is likely to be much faster. You can use scipy.spatial.cKDTree to search the ball to find nodes that are at some threshold distance from the query point:

 import numpy as np from scipy.spatial import cKDTree # it will be much easier (and faster) to deal with numpy arrays here (you could # always look up the corresponding node objects by index if you wanted to) X = np.array([(nx, ny, nz) for n in cell]) # construct a kD tree tree = cKDTree(X) # query it with the first point, find the indices of all points within a maximum # distance of 4.2 of the query point query_point = X[0] idx = tree.query_ball_point(query_point, r=4.2, p=2) # these indices are one out from yours, since they start at 0 rather than 1 print(idx) # [0, 2, 8] # query_ball_point doesn't return the distances, but we can easily compute these # using broadcasting neighbor_points = X[idx] d = np.sqrt(((query_point[None, :] - neighbor_points) ** 2).sum(1)) print(d) # [ 0. 0.37416574 0.71414284] 

Benchmarking:

A cKDTree is close quickly, even for a very large number of points:

 X = np.random.randn(10000000, 3) tree = cKDTree(X) %timeit tree.query_ball_point(np.random.randn(3), r=4.2) # 1 loops, best of 3: 229 ms per loop 

As you mentioned in the comments, the above example is a much tougher benchmark than your data. Due to the selected distance tolerance and the fact that the data is Gaussian (and therefore grouped around 0), it corresponds to approximately 99% of 10 m points.

Here we test for uniform data with a stricter distance limit that corresponds to approximately 30% of the points, as in your example:

 %timeit tree.query_ball_point((0., 0., 0.), r=1.2) # 10 loops, best of 3: 86 ms per loop 

Obviously, this is a lot more points than you use. For example data:

 tree = cKDTree(np_cell[1]) %timeit tree.query_ball_point(np_cell[1][0], r=4.2) # The slowest run took 4.26 times longer than the fastest. This could mean that an intermediate result is being cached # 100000 loops, best of 3: 16.9 µs per loop 

This conveniently uses your np_nodes_in_range2 function on my machine:

 %timeit sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2))) # The slowest run took 7.77 times longer than the fastest. This could mean that an intermediate result is being cached # 10000 loops, best of 3: 84.4 µs per loop 

Other things to consider:

If you need to query many points at the same time, it’s more efficient to build a second tree and use query_ball_tree instead of query_ball_point :

 X = np.random.randn(100, 3) Y = np.random.randn(10, 3) tree1 = cKDTree(X) tree2 = cKDTree(Y) # indices contains a list-of-lists, where the ith sublist contains the indices # of the neighbours of Y[i] in X indices = tree2.query_ball_tree(tree1, r=4.2) 

If you don't need indexes and just need the number of points in the ball, it will probably be faster to use count_neighbours :

 n_neighbors = tree2.count_neighbors(tree1, r=4.2) 
+4
source share

All Articles