Memory Growth Through NumPy Broadcast Operations

I use NumPy to process some large data matrices (about 50 GB in size). The machine on which I run this code has 128 GB of RAM, so simple linear operations of this magnitude should not be a memory problem.

However, I am seeing tremendous memory growth (up to over 100 GB) when computing the following code in Python:

import numpy as np # memory allocations (everything works fine) a = np.zeros((1192953, 192, 32), dtype='f8') b = np.zeros((1192953, 192), dtype='f8') c = np.zeros((192, 32), dtype='f8') a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here 

Please note that the initial allocation of memory is performed without any problems. However, when I try to perform a subtraction operation using broadcast, the memory increases to more than 100 GB. I always thought that translation would avoid additional memory allocations, but now I'm not sure that this is always the case.

As such, can someone give some details on why memory growth occurs, and how can the following code be rewritten using more memory-efficient constructs?

I am running code in Python 2.7 on an IPython Notebook.

+7
python numpy memory
source share
2 answers

@rth a proposal to have surgery in smaller batches is a good one. You can also try using the np.subtract function and give it the target array to avoid creating an additional temporary array. I also think that you do not need to index c as c[np.newaxis, :, :] , because it is already the 3rd array.

So instead

 a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here 

to try

 np.subtract(b[:, :, np.newaxis], c, a) 

The third argument to np.subtract is the target array.

+5
source share

Well, your array a already takes 1192953*192*32* 8 bytes/1.e9 = 58 GB memory.

Translation does not create additional memory allocations for the initial arrays, but the result

 b[:, :, np.newaxis] - c[np.newaxis, :, :] 

stored in a temporary array. Therefore, in this line you have allocated at least 2 arrays with the form a for the total used memory >116 GB .

You can avoid this problem by working with a smaller subset of your array at a time,

 CHUNK_SIZE = 100000 for idx in range(b.shape[0]/CHUNK_SIZE): sl = slice(idx*CHUNK_SIZE, (idx+1)*CHUNK_SIZE) a[sl] = b[sl, :, np.newaxis] - c[np.newaxis, :, :] 

it will be a little slower, but uses much less memory.

+3
source share

All Articles