Numba 3x slower than numpy

We have the numily get_pos_neg_bitwise vector function, which uses the mask = [132 20 192] and df.shape (500e3, 4), which we want to speed up with numba.

from numba import jit import numpy as np from time import time def get_pos_neg_bitwise(df, mask): """ In [1]: print mask [132 20 192] In [1]: print df [[ 1 162 97 41] [ 0 136 135 171] ..., [ 0 245 30 73]] """ check = (np.bitwise_and(mask, df[:, 1:]) == mask).all(axis=1) pos = (df[:, 0] == 1) & check neg = (df[:, 0] == 0) & check pos = np.nonzero(pos)[0] neg = np.nonzero(neg)[0] return (pos, neg) 

Using the advice from @morningsun, we made this version of numba:

 @jit(nopython=True) def numba_get_pos_neg_bitwise(df, mask): posneg = np.zeros((df.shape[0], 2)) for idx in range(df.shape[0]): vandmask = np.bitwise_and(df[idx, 1:], mask) # numba fail with # if np.all(vandmask == mask): vandm_equal_m = 1 for i, val in enumerate(vandmask): if val != mask[i]: vandm_equal_m = 0 break if vandm_equal_m == 1: if df[idx, 0] == 1: posneg[idx, 0] = 1 else: posneg[idx, 1] = 1 pos = list(np.nonzero(posneg[:, 0])[0]) neg = list(np.nonzero(posneg[:, 1])[0]) return (pos, neg) 

But it is still 3 times slower than numpy one (~ 0.06s Vs ~ 0.02s).

 if __name__ == '__main__': df = np.array(np.random.randint(256, size=(int(500e3), 4))) df[:, 0] = np.random.randint(2, size=(1, df.shape[0])) # set target to 0 or 1 mask = np.array([132, 20, 192]) start = time() pos, neg = get_pos_neg_bitwise(df, mask) msg = '==> pos, neg made; p={}, n={} in [{:.4} s] numpy' print msg.format(len(pos), len(neg), time() - start) start = time() msg = '==> pos, neg made; p={}, n={} in [{:.4} s] numba' pos, neg = numba_get_pos_neg_bitwise(df, mask) print msg.format(len(pos), len(neg), time() - start) start = time() pos, neg = numba_get_pos_neg_bitwise(df, mask) print msg.format(len(pos), len(neg), time() - start) 

Did I miss something?

 In [1]: %run numba_test2.py ==> pos, neg made; p=3852, n=3957 in [0.02306 s] numpy ==> pos, neg made; p=3852, n=3957 in [0.3492 s] numba ==> pos, neg made; p=3852, n=3957 in [0.06425 s] numba In [1]: 
+7
python numpy numba
source share
1 answer

Try transferring the call to np.bitwise_and outside the loop, since numba cannot do anything to speed it up:

 @jit(nopython=True) def numba_get_pos_neg_bitwise(df, mask): posneg = np.zeros((df.shape[0], 2)) vandmask = np.bitwise_and(df[:, 1:], mask) for idx in range(df.shape[0]): # numba fail with # if np.all(vandmask == mask): vandm_equal_m = 1 for i, val in enumerate(vandmask[idx]): if val != mask[i]: vandm_equal_m = 0 break if vandm_equal_m == 1: if df[idx, 0] == 1: posneg[idx, 0] = 1 else: posneg[idx, 1] = 1 pos = np.nonzero(posneg[:, 0])[0] neg = np.nonzero(posneg[:, 1])[0] return (pos, neg) 

Then I get the timings:

 ==> pos, neg made; p=3920, n=4023 in [0.02352 s] numpy ==> pos, neg made; p=3920, n=4023 in [0.2896 s] numba ==> pos, neg made; p=3920, n=4023 in [0.01539 s] numba 

So now numba is a bit faster than numpy.

Also, it didn't matter much, but in the original function you are returning numpy arrays, while in the numba version you have converted pos and neg to lists.

In general, I would suggest that numpy functions predominate in function calls, which numba cannot speed up, and the numpy version of the code already uses fast vectorization routines.

Update:

You can do this faster by removing the enumerate call and index directly into the array instead of capturing a slice. Separating pos and neg into separate arrays helps to avoid slicing along an non-adjacent axis in memory:

 @jit(nopython=True) def numba_get_pos_neg_bitwise(df, mask): pos = np.zeros(df.shape[0]) neg = np.zeros(df.shape[0]) vandmask = np.bitwise_and(df[:, 1:], mask) for idx in range(df.shape[0]): # numba fail with # if np.all(vandmask == mask): vandm_equal_m = 1 for i in xrange(vandmask.shape[1]): if vandmask[idx,i] != mask[i]: vandm_equal_m = 0 break if vandm_equal_m == 1: if df[idx, 0] == 1: pos[idx] = 1 else: neg[idx] = 1 pos = np.nonzero(pos)[0] neg = np.nonzero(neg)[0] return pos, neg 

And timings in ipython laptop:

  %timeit pos1, neg1 = get_pos_neg_bitwise(df, mask) %timeit pos2, neg2 = numba_get_pos_neg_bitwise(df, mask)​ 100 loops, best of 3: 18.2 ms per loop 100 loops, best of 3: 7.89 ms per loop 
+10
source share

All Articles