Short
My python program takes up a lot more memory than expected, or is returned by memory profiling tools. I need a strategy to find a memory leak and fix it.
Detailed
I am running a python3 script on a 64 bit Linux machine. Almost all code is linked in one object:
obj = MyObject(*myArguments) result = obj.doSomething() print(result)
When creating obj program reads a text file about approx. 100MB. Since I store information in several ways, I expect the entire object to occupy a pair of Hundret MB memory.
Indeed, measuring its size with asizeof.asized(obj) from the pympler package returns about 123 MB. However, top tells me that my program takes up about 1 GB of memory.
I understand that local variables in methods will take up additional RAM. However, looking through my code, I see that none of these local variables can be so big. I double-checked this using asizeof.asized again.
Itβs not very important for me that the script requires 1 GB of memory. However, I execute several methods in parallel (in 12 processes):
class MyObject() def doSomething(arg):
This makes use of 8GB shared memory, although I put all the large objects in shared memory:
self.myLargeNumPyArray = sharedmem.copy(self.myLargeNumPyArray)
I am sure that test programs really share memory.
Validation with asizeof , I got in each subprocess that
asizeof.asized(self) is 1 MB (that is, much smaller than the "original" object - perhaps due to shared memory that is not considered double)asizeof.asized(myOneAndOnlyBigLocalVariable) - 230 MB.
In general, my program should occupy no more than 123 MB + 12 * 230 MB = 2.8 GB <8GB. So why does a program require so much memory?
One explanation may be that in my object that is being copied, there are some hidden parts (garbage?) When the program runs in parallel.
Does anyone know a strategy to find out where a memory leak occurs? How can i fix this?
I read a lot of threads regarding memory profiling, for example. Memory profiling in python 3 , Is there any working profiler for Python3 , Which Python memory profiler is recommended? or How can I profile memory usage in Python? but all recommended tools do not explain memory usage.
Update
I was asked to provide a minimal code example. The code below shows the same problems with the memory consumption in the parallel part as the original one. I already figured out the problem with the non-parallel part of my code, which was that I had an array with a large number with the data type object as an object variable. Because of this data type, an array cannot be placed in shared memory, and asized returns only small size. Thanks @ user2357112 for helping me figure this out!
Therefore, I would like to focus on the problem in the parallel part: inserting values ββinto the queue in the singleSourceShortestPaths method (noted below with a comment) changes memory consumption from 1.5 GB to 10 GB . Any ideas how to explain this behavior?
import numpy as np from heapdict import heapdict from pympler import asizeof import sharedmem class RoadNetwork(): strType = "|S10" def __init__(self): vertexNo = 1000000 self.edges = np.zeros(1500000, dtype = {"names":["ID", "from_to", "from_to_original", "cost", "inspection", "spot"], 'formats':[self.strType, '2int', '2'+self.strType, "double", "3bool", "2int", "2int"]}) self.edges["ID"] = np.arange(self.edges.size) self.edges["from_to_original"][:vertexNo, 0] = np.arange(vertexNo) self.edges["from_to_original"][vertexNo:, 0] = np.random.randint(0, vertexNo, self.edges.size-vertexNo) self.edges["from_to_original"][:,1] = np.random.randint(0, vertexNo, self.edges.size) vertexIDs = np.unique(self.edges["from_to_original"]) self.vertices = np.zeros(vertexIDs.size, {"names":["ID", "type", "lakeID"], 'formats':[self.strType, 'int', self.strType]}) def singleSourceShortestPaths(self, sourceIndex): vertexData = np.zeros(self.vertices.size, dtype={"names":["predecessor", "edge", "cost"], 'formats':['int', "2int", "double"]}) queue = np.zeros((self.vertices.size, 2), dtype=np.double) #Crucual line!! Commetning this decreases memory usage by 7GB in the parallel part queue[:,0] = np.arange(self.vertices.size) queue = heapdict(queue) print("self in singleSourceShortestPaths", asizeof.asized(self)) print("queue in singleSourceShortestPaths", asizeof.asized(queue)) print("vertexData in singleSourceShortestPaths", asizeof.asized(vertexData)) # do stuff (in my real program Dijkstra algorithm would follow) # I inserted this lines as an ugly version for 'wait()' to # give me enough time to measure the memory consumption in 'top' for i in range(10000000000): pass return vertexData def determineFlowInformation(self): print("self in determineFlowInformation", asizeof.asized(self)) f = lambda i: self.singleSourceShortestPaths(i) self.parmap(f, range(30)) def parmap(self, f, argList): """ Executes f(arg) for arg in argList in parallel returns a list of the results in the same order as the arguments, invalid results (None) are ignored """ self.__make_np_arrays_sharable() with sharedmem.MapReduce() as pool: results, to_do_list = zip(*pool.map(f, argList)) return results def __make_np_arrays_sharable(self): """ Replaces all numpy array object variables, which should have the same behaviour / properties as the numpy array """ varDict = self.__dict__ for key, var in varDict.items(): if type(var) is np.ndarray: varDict[key] = sharedmem.copy(var) if __name__ == '__main__': network = RoadNetwork() print(asizeof.asized(network, detail=1)) for key, var in network.__dict__.items(): print(key, asizeof.asized(var)) network.determineFlowInformation()