Why does comparing a numpy array with a list consume so much memory?

This bit has stung me recently. I solved this by removing all comparisons of numpy arrays with lists from code. But why did the garbage collector miss it?

Run this and watch it eat your memory:

import numpy as np r = np.random.rand(2) l = [] while True: r == l 

Powered by 64-bit Ubuntu 10.04, virtualenv 1.7.2, Python 2.7.3, Numpy 1.6.2

+6
source share
2 answers

Just in case, someone stumbles on this and wonders ...

@Dugal yes, I believe this is a memory leak in current versions of numpy (September 2012), which occurs when some Exceptions occur (see this and this ). Why does adding the gc call that @BiRico β€œfixed” seem strange to me, although it should be done right after it appears? Maybe its weirdness in how python trash collects traces if someone knows about Exception handling and CPython Internals compiler assembly, I would be interested.

Temporary solution . This is not directly related to lists, but for example, most broadcast exceptions (an empty list does not match the size of the arrays, an empty array leads to the same leak. There is an exception that never appears). So, as a workaround, you should probably first check to see if the shape is correct (if you do this a lot, otherwise I would not really worry, this only seeps in a small line if I understood correctly).

FIXED: This issue will be fixed with numpy 1.7.

+4
source

Sorry, I cannot give a more complete answer, but this seems to be related to garbage collection. I was able to recreate this problem using python 2.7.2, numpy 1.6.1 on Redhat 5.8. However, when I tried the following, memory usage returned to normal.

 import gc import numpy as np r = np.random.rand(2) l = [] while True: r == l gc.collect() 
0
source

All Articles