Passing python function to numpy arrays

Say we have a particularly simple function like

import scipy as sp def func(x, y): return x + y 

This function obviously works for several python x and y built-in data types, such as string, list, int, float, array, etc. Since we are particularly interested in arrays, we consider two arrays:

 x = sp.array([-2, -1, 0, 1, 2]) y = sp.array([-2, -1, 0, 1, 2]) xx = x[:, sp.newaxis] yy = y[sp.newaxis, :] >>> func(xx, yy) 

it returns

 array([[-4, -3, -2, -1, 0], [-3, -2, -1, 0, 1], [-2, -1, 0, 1, 2], [-1, 0, 1, 2, 3], [ 0, 1, 2, 3, 4]]) 

as expected.

Now, what if someone wants to throw in arrays as inputs for the next function?

 def func2(x, y): if x > y: return x + y else: return x - y 

Doing >>>func(xx, yy) will result in an error.

The first obvious method that could be used is the sp.vectorize function in scipy / numpy. This method, however, was not very effective. Can anyone think of a more reliable way to translate any function as a whole to numpy arrays?

If re-writing code in an array-friendly way is the only way, this will help if you can also mention it here.

+7
source share
3 answers

np.vectorize is a common way to convert Python functions that work with numbers to numpy functions that work with ndarrays.

However, as you point out, this is not very fast, as it uses the Python loop under the hood.

To achieve a higher speed, you need to manually use a function that expects input of numpy arrays as input and takes advantage of this numpy-ness:

 import numpy as np def func2(x, y): return np.where(x>y,x+y,xy) x = np.array([-2, -1, 0, 1, 2]) y = np.array([-2, -1, 0, 1, 2]) xx = x[:, np.newaxis] yy = y[np.newaxis, :] print(func2(xx, yy)) # [[ 0 -1 -2 -3 -4] # [-3 0 -1 -2 -3] # [-2 -1 0 -1 -2] # [-1 0 1 0 -1] # [ 0 1 2 3 0]] 

Regarding performance:

test.py

 import numpy as np def func2a(x, y): return np.where(x>y,x+y,xy) def func2b(x, y): ind=x>y z=np.empty(ind.shape,dtype=x.dtype) z[ind]=(x+y)[ind] z[~ind]=(xy)[~ind] return z def func2c(x, y): # x, y= x[:, None], y[None, :] A, L= x+ y, x<= y A[L]= (x- y)[L] return A N=40 x = np.random.random(N) y = np.random.random(N) xx = x[:, np.newaxis] yy = y[np.newaxis, :] 

Launch:

With N = 30:

 % python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)' 1000 loops, best of 3: 219 usec per loop % python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)' 1000 loops, best of 3: 488 usec per loop % python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)' 1000 loops, best of 3: 248 usec per loop 

With N = 1000:

 % python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)' 10 loops, best of 3: 93.7 msec per loop % python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)' 10 loops, best of 3: 367 msec per loop % python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)' 10 loops, best of 3: 186 msec per loop 

This seems to mean that func2a slightly faster than func2c (and func2b very slow).

+11
source

For this special case, you can also write a function that works with both NumPy arrays and regular Python boards:

 def func2d(x, y): z = 2.0 * (x > y) - 1.0 z *= y return x + z 

This version is also more than four times faster than unutbu func2a() (tested with N = 100 ).

+10
source

To get the basic idea, you can change your function, for example, like this:

 def func2(x, y): x, y= x[:, None], y[None, :] A= x+ y A[x<= y]= (x- y)[x<= y] return A 

So in your case, something like this should be a very reasonable starting point:

 In []: def func(x, y): ..: x, y= x[:, None], y[None, :] ..: return x+ y ..: In []: def func2(x, y): ..: x, y= x[:, None], y[None, :] ..: A, L= x+ y, x<= y ..: A[L]= (x- y)[L] ..: return A ..: In []: x, y= arange(-2, 3), arange(-2, 3) In []: func(x, y) Out[]: array([[-4, -3, -2, -1, 0], [-3, -2, -1, 0, 1], [-2, -1, 0, 1, 2], [-1, 0, 1, 2, 3], [ 0, 1, 2, 3, 4]]) In []: func2(x, y) Out[]: array([[ 0, -1, -2, -3, -4], [-3, 0, -1, -2, -3], [-2, -1, 0, -1, -2], [-1, 0, 1, 0, -1], [ 0, 1, 2, 3, 0]]) 

Although this type of processing may seem unnecessary, it is not necessary. Always measure the actual performance of your programs and make the necessary (not earlier) necessary changes.

IMHO for added value: this type of "vectorization" makes your code truly consistent and readable in the long run.

+1
source

All Articles