Segment spacing using numba jit, Python

Question

Segment spacing using numba jit, Python

I asked related questions on this stack last week, trying to isolate things I didn't understand about using the @jit decorator with Numba in Python. However, I hit the wall, so I just write the whole problem.

The problem is calculating the minimum distance between pairs of large segments. Segments are represented by their start and end points in 3D. Mathematically, each segment is parameterized as [AB] = A + (BA) * s, where s is in [0,1], and A and B are the beginning and end points of the segment. For two such segments, you can calculate the minimum distance and get the formula here .

I already uncovered this problem in another thread, and the answer was to replace the double loops of my code by vectorizing the problem, which, however, would affect memory problems for large sets of segments. So I decided to stick with loops and use numba jit instead.

Since many point products are required to solve the problem, and the numpy dot product is not supported by numba , I started by implementing my own three-dimensional point product.

import numpy as np from numba import jit, autojit, double, float64, float32, void, int32 def my_dot(a,b): res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2] return res dot_jit = jit(double(double[:], double[:]))(my_dot) #I know, it not of much use here.

The function that calculates the minimum distance of all pairs in N segments takes Nx6 as an input array (6 coordinates)

 def compute_stuff(array_to_compute): N = len(array_to_compute) con_mat = np.zeros((N,N)) for i in range(N): for j in range(i+1,N): p0 = array_to_compute[i,0:3] p1 = array_to_compute[i,3:6] q0 = array_to_compute[j,0:3] q1 = array_to_compute[j,3:6] s = ( dot_jit((p1-p0),(q1-q0))*dot_jit((q1-q0),(p0-q0)) - dot_jit((q1-q0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 ) t = ( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(p0-q0)) -dot_jit((p1-p0),(q1-q0))*dot_jit((p1-p0),(p0-q0)))/( dot_jit((p1-p0),(p1-p0))*dot_jit((q1-q0),(q1-q0)) - dot_jit((p1-p0),(q1-q0))**2 ) con_mat[i,j] = np.sum( (p0+(p1-p0)*s-(q0+(q1-q0)*t))**2 ) return con_mat fast_compute_stuff = jit(double[:,:](double[:,:]))(compute_stuff)

So, compute_stuff (arg) takes 2D np.array (double [:,:]) as an argument, performs a bunch of operations with numba support (?) And returns another 2D np.array (double [:,:]). Nevertheless,

 v = np.random.random( (100,6) ) %timeit compute_stuff(v) %timeit fast_compute_stuff(v)

I get 134 and 123 ms per cycle. Can you shed some light on why I cannot speed up my work? Any feedback would be highly appreciated.

+8

python numpy jit numba

Mathusalem Mar 10 '15 at 18:01

source share

1 answer

Joshdel · Accepted Answer · 2015-03-11T01:47:47+0000

Here, my version of your code is much faster:

 @jit(nopython=True) def dot(a,b): res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2] return res @jit def compute_stuff2(array_to_compute): N = array_to_compute.shape[0] con_mat = np.zeros((N,N)) p0 = np.zeros(3) p1 = np.zeros(3) q0 = np.zeros(3) q1 = np.zeros(3) p0m1 = np.zeros(3) p1m0 = np.zeros(3) q0m1 = np.zeros(3) q1m0 = np.zeros(3) p0mq0 = np.zeros(3) for i in range(N): for j in range(i+1,N): for k in xrange(3): p0[k] = array_to_compute[i,k] p1[k] = array_to_compute[i,k+3] q0[k] = array_to_compute[j,k] q1[k] = array_to_compute[j,k+3] p0m1[k] = p0[k] - p1[k] p1m0[k] = -p0m1[k] q0m1[k] = q0[k] - q1[k] q1m0[k] = -q0m1[k] p0mq0[k] = p0[k] - q0[k] s = ( dot(p1m0, q1m0)*dot(q1m0, p0mq0) - dot(q1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 ) t = ( dot(p1m0, p1m0)*dot(q1m0, p0mq0) - dot(p1m0, q1m0)*dot(p1m0, p0mq0))/( dot(p1m0, p1m0)*dot(q1m0, q1m0) - dot(p1m0, q1m0)**2 ) for k in xrange(3): con_mat[i,j] += (p0[k]+(p1[k]-p0[k])*s-(q0[k]+(q1[k]-q0[k])*t))**2 return con_mat

And timings:

 In [38]: v = np.random.random( (100,6) ) %timeit compute_stuff(v) %timeit fast_compute_stuff(v) %timeit compute_stuff2(v) np.allclose(compute_stuff2(v), compute_stuff(v)) #10 loops, best of 3: 107 ms per loop #10 loops, best of 3: 108 ms per loop #10000 loops, best of 3: 114 µs per loop #True

My main strategy for speeding this up was:

Get rid of all array expressions and explicitly expand the vector (especially because your arrays are so small)
Avoid redundant calculations (subtracting two vectors) in your dot method calls.
Move all array creation beyond nested loops so that numba could potentially do loop lifting . It also avoids creating many small arrays that are expensive. It is better to allocate once and reuse memory.

Another thing to note is that for recent versions of numba, what needs to be called autojit (that is, by letting numba draw type inference on inputs) and now just a decorator without type hints is usually just as fast. as well as indicating your input types in my experience.

Timing was also done using numba 0.17.0 on OS X using the python Anaconda distribution with Python 2.7.9.

Segment spacing using numba jit, Python

More articles: