How to optimize calculations for this function in numpy?

Question

How to optimize calculations for this function in numpy?

I want to implement the following problem in numpy and here is my code.

I tried the following numpy code for this problem with one for the loop. I am wondering if there is a more efficient way to do this calculation? I really appreciate that!

k, d = X.shape m = Y.shape[0] c1 = 2.0*sigma**2 c2 = 0.5*np.log(np.pi*c1) c3 = np.log(1.0/k) L_B = np.zeros((m,)) for i in xrange(m): if i % 100 == 0: print i L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide( np.power(XY[i,:],2), c1)-c2,1)+c3))) print np.mean(L_B)

I thought of np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y , creating a three-dimensional tensor, so the next calculation can be done using broadcast transmission, but it will waste a lot of memory when m great.

I also believe that np.einsum() uses nothing but a for loop, so it may not be as efficient, correct me if I'm wrong.

Any thought?

+7

python python-2.7 numpy machine-learning kernel-density

xxx222 Apr 22 '17 at 17:32

source share

1 answer

Divakar · Accepted Answer · 2017-04-22T18:06:59+0000

Optimization Stage No. 1

My first level of optimizations using direct translation of the loop code to broadcasting based on one when introducing a new axis and, as such, is not as memory efficient as indicated below -

 p1 = (-((X[:,None] - Y)**2)/c1)-c2 p11 = p1.sum(2) p2 = np.exp(p11+c3) out = np.log(p2.sum(0)).mean()

Optimization Stage No. 2

Having given several optimizations, bearing in mind that we intend to single out operations on constants, I got the following:

 c10 = -c1 c20 = X.shape[1]*c2 subs = (X[:,None] - Y)**2 p00 = subs.sum(2) p10 = p00/c10 p11 = p10-c20 p2 = np.exp(p11+c3) out = np.log(p2.sum(0)).mean()

Optimization Stage No. 3

Going further, and seeing places where I can optimize operations, I ended up using Scipy cdist to replace heavy squared and sum-reduction . This should be pretty memory efficient and give us the final implementation, as shown below -

 from scipy.spatial.distance import cdist # Setup constants c10 = -c1 c20 = X.shape[1]*c2 c30 = c20-c3 c40 = np.exp(c30) c50 = np.log(c40) # Get stagewise operations corresponding to loopy ones p1 = cdist(X, Y, 'sqeuclidean') p2 = np.exp(p1/c10).sum(0) out = np.log(p2).mean() - c50

Runtime test

Approaches -

 def loopy_app(X, Y, sigma): k, d = X.shape m = Y.shape[0] c1 = 2.0*sigma**2 c2 = 0.5*np.log(np.pi*c1) c3 = np.log(1.0/k) L_B = np.zeros((m,)) for i in xrange(m): L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide( np.power(XY[i,:],2), c1)-c2,1)+c3))) return np.mean(L_B) def vectorized_app(X, Y, sigma): # Setup constants k, d = D_A.shape c1 = 2.0*sigma**2 c2 = 0.5*np.log(np.pi*c1) c3 = np.log(1.0/k) c10 = -c1 c20 = X.shape[1]*c2 c30 = c20-c3 c40 = np.exp(c30) c50 = np.log(c40) # Get stagewise operations corresponding to loopy ones p1 = cdist(X, Y, 'sqeuclidean') p2 = np.exp(p1/c10).sum(0) out = np.log(p2).mean() - c50 return out

Timing and Verification -

 In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number ...: X = np.random.randint(0,9,(100,10)) ...: Y = np.random.randint(0,9,(10000,10)) ...: sigma = 2.34 ...: In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma)) Out[295]: True In [296]: %timeit loopy_app(X, Y, sigma) 1 loops, best of 3: 225 ms per loop In [297]: %timeit vectorized_app(X, Y, sigma) 10 loops, best of 3: 23.6 ms per loop In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number ...: X = np.random.randint(0,9,(100,10)) ...: Y = np.random.randint(0,9,(100000,10)) ...: sigma = 2.34 ...: In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma)) Out[299]: True In [300]: %timeit loopy_app(X, Y, sigma) 1 loops, best of 3: 2.27 s per loop In [301]: %timeit vectorized_app(X, Y, sigma) 1 loops, best of 3: 243 ms per loop

Around 10x speed there!

How to optimize calculations for this function in numpy?

More articles: