How to avoid huge additional memory consumption when using numpy vectorize?

This code below best illustrates my problem:

The output to the console (it takes about 8 minutes to start even the first test) displays the allocation of arrays of 512x512x512x16 bits consuming no more than expected (256 MB for each), and looking at the "top" process as a whole remains under-600MByte, as expected.

However , while a vectorized version of the function is being called, the process expands to a huge size (more than 7 GB!). Even the most obvious explanation I can think of is vectorization β€” converting the input and output to float64 inside β€” can only be a couple gigabytes, even if the vectorized function returns int16, and the returned array is certainly int16. Is there any way to avoid this? Do I use / understand that the vectorization of the otypes argument is incorrect?

import numpy as np import subprocess def logmem(): subprocess.call('cat /proc/meminfo | grep MemFree',shell=True) def fn(x): return np.int16(x*x) def test_plain(v): print "Explicit looping:" logmem() r=np.zeros(v.shape,dtype=np.int16) for z in xrange(v.shape[0]): for y in xrange(v.shape[1]): for x in xrange(v.shape[2]): r[z,y,x]=fn(x) print type(r[0,0,0]) logmem() return r vecfn=np.vectorize(fn,otypes=[np.int16]) def test_vectorize(v): print "Vectorize:" logmem() r=vecfn(v) print type(r[0,0,0]) logmem() return r logmem() s=(512,512,512) v=np.ones(s,dtype=np.int16) logmem() test_plain(v) test_vectorize(v) v=None logmem() 

I use this or that version of Python / numpy on the amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).

+7
source share
2 answers

you can read the source code of vectorize (). It converts the dtype array to an object and calls np.frompyfunc () to create ufunc from your python function, ufunc returns an array of objects and finally vectorize () converts the array of objects to an int16 array.

It will use a lot of memory when the dtype of the array is an object.

Using the python function for elementary computing is slow, even converting to ufunc by bypyfunc ().

+1
source

The main problem of vectorization is that all intermediate values ​​are also vectors. Although this is a convenient way to achieve a decent increase in speed, it can be very inefficient when using memory and will constantly beat your processor cache. To overcome this problem, you need to use an approach that has explicit loops running at compiled speed rather than python speed. The best ways to do this is to use cython , fortran code wrapped in f2py or numexpr . You can find a comparison of these approaches here , although it is more speed oriented than memory usage.

+2
source

All Articles