Python big data manipulation matrix

I think I have a big script (N = 1e6 and size = 3). I need to manipulate the matrix several times, like einsum, invert the matrix, etc. In my code. To give an idea, I want to do something like below.

import numpy.random as rd

ndata, kdata = 1e6, 1e5

x = rd.normal(0,1,(ndata, kdata,3,3))

y = rd.normal(0,1,(ndata, kdata,3,3))

For small ndata, subsequent kdata will be an efficient and convenient approach,

xy =  einsum('pqrs, pqsu -> pqru', x, y )

Since I have large ndata and kdata the above approach becomes a memory problem, so the next bet will be a point product with a nested loop over ndata and kdata as follows:

xyloop1 = np.empty((ndata, kdata, 3, 3))

for j in xrange(ndata):

    for k in xrange(kdata):

        xyloop1[j,k] =  np.dot(x[j,k], y[j,k] )

Given what I'm being taught for cycles, it's unpleasant in python. In addition, I want to take advantage of numpy, so the matrix block matrix approach is preferable to the following:

nstep = 200
ndiv  = ndata/nstep   

kstep = 200
kdiv  = kdata/kstep   

xyloop2 = np.empty((ndata, kdata, 3, 3))

for j in xrange(ndiv):

    ji, jf = j*nstep, (j+1)*nstep     

    for k in xrange(kdiv):

        ki, kf = k*kstep, (k+1)*kstep     

        xyloop2[ji:jf,ki:kf] =  einsum('pqrs, pqsu -> pqru', x[ji:jf,ki:kf], y[ji:jf,ki:kf] )

, xy xyloop1 xyloop2 . . -, , 3, -, 2? - , , .

, , , :). . BTW . !

+4
2

, - , , numpy . :

  • numpy 1000 /, 2 , np.dot , 27 .
  • for-loop python, , ( C- ).
  • N- , , , numpy , np.einsum. C = np.sum(A[...,:,None] * B[...,:,:], axis=-2) ( ).

, :

xyloop2 = np.empty((ndata, kdata, 3, 3))

for i in xrange(ndata):
    xyloop2[i] = np.sum(x[i,:,:,:,None] * y[i,:,None,:,:], axis=-2)

2, ( ) . , , .

+2

-, einsum . p, q, r, s 100, 50, 3, 3

I:

%timeit tt=np.einsum('pqrs, pqsu->pqru',x,y)
100 loops, best of 3: 3.45 ms per loop

%timeit zz= np.sum(x[:,:,:,None,:]*y[:,:,:,None],axis=-2)
10000 loops, best of 3: 153 µs per loop

II:

%timeit zz= np.sum(x[:,:,:,None,:]*y[:,:,:,None,None],axis=-2)
1000 loops, best of 3: 274 µs per loop

%timeit tt=np.einsum('pqrs, pqs->pqr',x,y)
10000 loops, best of 3: 151 µs per loop

np.allclose(zz,tt)
True
+1

All Articles