In the route planning algorithm, I am trying to filter in a list of nodes based on the distance to another node. I actually pull lists from the rough schedule of the scene. I use the term “cell” to refer to a volume in a simple scenario, from which we deduced a list of nodes that are close to each other.
Now I am implementing this as:
# SSCCE version of the core function def nodes_in_range(src, cell, maxDist): srcX, srcY, srcZ = src.x, src.y, src.z maxDistSq = maxDist ** 2 for node in cell: distSq = (node.x - srcX) ** 2 if distSq > maxDistSq: continue distSq += (node.y - srcY) ** 2 if distSq > maxDistSq: continue distSq += (node.z - srcZ) ** 2 if distSq <= maxDistSq: yield node, distSq ** 0.5
This procedure is called a lot (10 ^ 7 + times in some queries), so every bit of punching matters and avoids the elementary search with conventions that really helped.
What I'm trying to do is switch to numpy and arrange the cells so that I can vectorize. I want to achieve this:
import numpy import numpy.linalg contarry = numpy.ascontiguousarray float32 = numpy.float32 # The "np_cell" has two arrays: one is the list of nodes and the # second is a vectorizable array of their positions. # np_cell[N][1] == numpy array position of np_cell[N][0] def make_np_cell(cell): return ( cell, contarry([contarry((node.x, node.y, node.z), float32) for node in cell]), ) # This version fails because norm returns a single value. def np_nodes_in_range1(srcPos, np_cell, maxDist): distances = numpy.linalg.norm(np_cell[1] - srcPos) for (node, dist) in zip(np_cell[0], distances): if dist <= maxDist: yield node, dist # This version fails because def np_nodes_in_range2(srcPos, np_cell, maxDist): # this will fail because the distances are wrong distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1) for (node, dist) in zip(np_cell[0], distances): if dist <= maxDist: yield node, dist # This version doesn't vectorize and so performs poorly def np_nodes_in_range3(srcPos, np_cell, maxDist): norm = numpy.linalg.norm for (node, pos) in zip(np_cell[0], np_cell[1]): dist = norm(srcPos - pos) if dist <= maxDist: yield node, dist if __name__ == "__main__": np_cell = make_np_cell(cell) srcPos = np_cell[1][0] # Position column [1], first node [0] print("v1 - fails because it gets a single distance") try: for node, dist in np_nodes_in_range1(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist)) except TypeError: print("distances was a single value") print("v2 - gets the wrong distance values") for node, dist in np_nodes_in_range2(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist)) print("v3 - slower") for node, dist in np_nodes_in_range3(srcPos, np_cell, float32(4.2)): print("{:3n} {:5.2f}".format(node.ID, dist))
Combined integer here - I included v4, which tries to use enumerate instead of zip and finds that it is about 12us slower.
Output Example:
1 0.00 3 0.37 9 0.71 v1 - fails because it gets a single distance distances was a single value v2 - gets the wrong distance values 1 0.00 3 0.60 9 1.10 v3 - slower 1 0.00 3 0.37 9 0.71 v4 - v2 using enumerate 1 0.00 3 0.60 9 1.10
In terms of performance, we can verify this with timeit . I am increasing the number of nodes in a cell with simple multiplication:
In [2]: from sscce import * In [3]: cell = cell * 32 # increase to 320 nodes In [4]: len(cell) Out[4]: 320 In [5]: %timeit -n 1000 -r 7 sum(1 for _ in nodes_in_range(cell[0], cell, 4.2)) 1000 loops, best of 7: 742 µs per loop In [6]: np_cell = make_np_cell(cell) In [7]: srcPos = np_cell[1][0] In [8]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2))) 1000 loops, best of 7: 136 µs per loop In [9]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range3(srcPos, np_cell, numpy.float32(4.2))) 1000 loops, best of 7: 3.64 ms per loop
Main characteristics:
nodes_in_range 1000 loops, best of 7: 742 µs per loop np_nodes_in_range2 1000 loops, best of 7: 136 µs per loop np_nodes_in_range3 1000 loops, best of 7: 3.64 ms per loop
Questions:
What am I doing wrong with calculating vectorized distances?
distances = numpy.linalg.norm(np_cell[1] - srcPos)
against
distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)
Is this a better approach?
The number of cells varies between several nodes and hundreds. I am currently sorting through the cells, but it seems likely that I want the marshal of a full set of candidates (nodes[], positions[]) , despite the potential additional cost of creating a list for this (I could always use a batch drive so that I always tried and filled the battery with, say, at least 1024 positions before the drain). But I think that this thinking is formed through the use of a continuous array. Should I look at something like:
nodes_in_range(src, chain(cell.nodes for cell in scene if cell_in_range(boundingBox)))
and not worry about trying to smooth it all out?