Memory problem with assigning an array with Numpy memory

In the following code:

@profile def do(): import random import numpy as np image = np.memmap('image.np', mode='w+', dtype=np.float32, shape=(10000, 10000)) print("Before assignment") x = random.uniform(1000, 9000) y = random.uniform(1000, 9000) imin = int(x) - 128 imax = int(x) + 128 jmin = int(y) - 128 jmax = int(y) + 128 data = np.random.random((256,256)) image[imin:imax, jmin:jmax] = image[imin:imax, jmin:jmax] + data del x, y, imin, imax, jmin, jmax, data print("After assignment") do() 

The memory used in the second print statement is increased compared to the end of the first print statement - here is the output of memory_profiler:

 Line # Mem usage Increment Line Contents ================================================ 1 @profile 2 def do(): 3 10.207 MB 0.000 MB 4 10.734 MB 0.527 MB import random 5 21.066 MB 10.332 MB import numpy as np 6 7 21.105 MB 0.039 MB image = np.memmap('image.np', mode='w+', dtype=np.float32, shape=(10000, 10000)) 8 9 21.109 MB 0.004 MB print("Before assignment") 10 11 21.109 MB 0.000 MB x = random.uniform(1000, 9000) 12 21.109 MB 0.000 MB y = random.uniform(1000, 9000) 13 21.109 MB 0.000 MB imin = int(x) - 128 14 21.109 MB 0.000 MB imax = int(x) + 128 15 21.113 MB 0.004 MB jmin = int(y) - 128 16 21.113 MB 0.000 MB jmax = int(y) + 128 17 21.625 MB 0.512 MB data = np.random.random((256,256)) 18 23.574 MB 1.949 MB image[imin:imax, jmin:jmax] = image[imin:imax, jmin:jmax] + data 19 20 23.574 MB 0.000 MB del x, y, imin, imax, jmin, jmax, data 21 22 23.574 MB 0.000 MB print("After assigment") 

RAM increased from 21.109Mb to 23.574Mb. This causes problems if I put this block of code in a loop:

 Line # Mem usage Increment Line Contents ================================================ 1 @profile 2 def do(): 3 10.207 MB 0.000 MB 4 10.734 MB 0.527 MB import random 5 21.066 MB 10.332 MB import numpy as np 6 7 21.105 MB 0.039 MB image = np.memmap('image.np', mode='w+', dtype=np.float32, shape=(10000, 10000)) 8 9 21.109 MB 0.004 MB print("Before assignment") 10 11 292.879 MB 271.770 MB for i in range(1000): 12 13 292.879 MB 0.000 MB x = random.uniform(1000, 9000) 14 292.879 MB 0.000 MB y = random.uniform(1000, 9000) 15 292.879 MB 0.000 MB imin = int(x) - 128 16 292.879 MB 0.000 MB imax = int(x) + 128 17 292.879 MB 0.000 MB jmin = int(y) - 128 18 292.879 MB 0.000 MB jmax = int(y) + 128 19 292.879 MB 0.000 MB data = np.random.random((256,256)) 20 292.879 MB 0.000 MB image[imin:imax, jmin:jmax] = image[imin:imax, jmin:jmax] + data 21 22 292.879 MB 0.000 MB del x, y, imin, imax, jmin, jmax, data 23 24 292.879 MB 0.000 MB print("After assignment") 

and the RAM used will increase at each iteration. Is there any way to avoid this problem? Is this a Numpy bug or am I doing something wrong?

EDIT: this is on MacOS X, and I see a problem with Python 2.7 and 3.2, with Numpy 1.6.2 and later (including the dev version).

EDIT 2: I also see a problem with Linux.

+4
source share
2 answers

I assume that numpy writes data to the buffer first, and only later to the file. Perhaps for performance reasons.

I did some tests, and after your destination line the image.np file image.np not changed. The file was modified only after deleting the image object or made image.flush() . If memory is of utmost importance, you can try putting image.flush() in your loop to see if it fixes the problem.

+1
source

For optimization reasons, data cannot be written from np.memmap until the destructor for image called. You can avoid this by opening image as copy-on-write:

 image = np.memmap('image.np', mode='c', dtype=np.float32, shape=(10000, 10000)) 

or you can call del image and then reopen once in each loop, but that doesn't seem like a good idea.

0
source

All Articles