The Fastest Way to Use Numpy - Multidimensional Amounts and Products

I have these variables with the following sizes:

A - (3,) B - (4,) X_r - (3,K,N,nS) X_u - (4,K,N,nS) k - (K,) 

and I want to calculate (A.dot(X_r[:,:,n,s])*B.dot(X_u[:,:,n,s])).dot(k) for all possible n and s , so as I do it now, this is the following:

 np.array([[(A.dot(X_r[:,:,n,s])*B.dot(X_u[:,:,n,s])).dot(k) for n in xrange(N)] for s in xrange(nS)]) #nSxN 

But this is very slow, and I was wondering if there is a better way to do this, but I'm not sure.

However, there is another calculation that I am doing, and I am sure that it can be optimized:

 np.sum(np.array([(X_r[:,:,n,s]*B.dot(X_u[:,:,n,s])).dot(k) for n in xrange(N)]),axis=0) 

In this, I create a numpy array to sum it on one axis and discard the array after. If it were a list in 1-D, I would use reduce and optimize it, what should I use for numpy arrays?

+5
source share
2 answers

Using multiple np.einsum calls -

 # Calculation of A.dot(X_r[:,:,n,s]) p1 = np.einsum('i,ijkl->jkl',A,X_r) # Calculation of B.dot(X_u[:,:,n,s]) p2 = np.einsum('i,ijkl->jkl',B,X_u) # Include .dot(k) part to get the final output out = np.einsum('ijk,i->kj',p1*p2,k) 

About the second example, this solves:

 p1 = np.einsum('i,ijkl->jkl',B,X_u)#OUT_DIM - (k,N,nS) sol = np.einsum('ijkl,j->il',X_r*p1[None,:,:,:],k)#OUT_DIM (3,nS) 
+4
source

You can use dot to multiply matrices in higher dimensions, but the current indices should be the last. When we reorder your matrices

 X_r_t = X_r.transpose(2,3,0,1) X_u_t = X_u.transpose(2,3,0,1) 

get for your first expression

 res1_imp = (A.dot(X_r_t)*B.dot(X_u_t)).dot(k).T # shape nS x N 

and for the second expression

 res2_imp = np.sum((X_r_t * B.dot(X_u_t)[:,:,None,:]).dot(k),axis=0)[-1] 

Delay

Divakars solution gives on my computer 10000 loops, best of 3: 21.7 µs per loop

my solution gives 10000 loops, best of 3: 101 µs per loop

Edit

My top timings included calculating both expressions. When I include only the first expression (like Divakar), I get 10000 loops, best of 3: 41 µs per loop ... which is still slower, but closer to its timing

+3
source

All Articles