Python memory usage for objects and processes

I wrote the following code:

from hurry.size import size from pysize import get_zise import os import psutil def load_objects(): process = psutil.Process(os.getpid()) print "start method" process = psutil.Process(os.getpid()) print "process consumes " + size(process.memory_info().rss) objects = make_a_call() print "total size of objects is " + (get_size(objects)) print "process consumes " + size(process.memory_info().rss) print "exit method" def main(): process = psutil.Process(os.getpid()) print "process consumes " + size(process.memory_info().rss) load_objects() print "process consumes " + size(process.memory_info().rss) 

get_size() returns the memory consumption of objects using this code.

I get the following prints:

 process consumes 21M start method total size of objects is 20M process consumes 29M exit method process consumes 29M 
  • Why did objects consume 20M if the process consumed only 8M more?
  • If I exit the method, then the memory should not decrease to 21, since the garbage collector will clear the consumed memory?
+7
python garbage-collection
source share
2 answers
  • Most likely, this is because there is an inaccuracy in your code.

Here's an example of a fully working (python 2.7) that has the same problem (I updated the source code a bit for simplicity)

 from hurry.filesize import size from pysize import get_size import os import psutil def make_a_call(): return range(1000000) def load_objects(): process = psutil.Process(os.getpid()) print "start method" process = psutil.Process(os.getpid()) print"process consumes ", size(process.memory_info().rss) objects = make_a_call() # FIXME print "total size of objects is ", size(get_size(objects)) print "process consumes ", size(process.memory_info().rss) print "exit method" def main(): process = psutil.Process(os.getpid()) print "process consumes " + size(process.memory_info().rss) load_objects() print "process consumes " + size(process.memory_info().rss) main() 

Here's the conclusion:

 process consumes 7M start method process consumes 7M total size of objects is 30M process consumes 124M exit method process consumes 124M 

Difference ~ 100 Mb

And here is the corrected version of the code:

 from hurry.filesize import size from pysize import get_size import os import psutil def make_a_call(): return range(1000000) def load_objects(): process = psutil.Process(os.getpid()) print "start method" process = psutil.Process(os.getpid()) print"process consumes ", size(process.memory_info().rss) objects = make_a_call() print "process consumes ", size(process.memory_info().rss) print "total size of objects is ", size(get_size(objects)) print "exit method" def main(): process = psutil.Process(os.getpid()) print "process consumes " + size(process.memory_info().rss) load_objects() print "process consumes " + size(process.memory_info().rss) main() 

And here is the updated output:

 process consumes 7M start method process consumes 7M process consumes 38M total size of objects is 30M exit method process consumes 124M 

Did you notice the difference? You measure the size of objects before measuring the final size of the process, and this leads to additional memory consumption. Let's see why this can happen - here are the sources https://github.com/bosswissam/pysize/blob/master/pysize.py :

 import sys import inspect def get_size(obj, seen=None): """Recursively finds size of objects in bytes""" size = sys.getsizeof(obj) if seen is None: seen = set() obj_id = id(obj) if obj_id in seen: return 0 # Important mark as seen *before* entering recursion to gracefully handle # self-referential objects seen.add(obj_id) if hasattr(obj, '__dict__'): for cls in obj.__class__.__mro__: if '__dict__' in cls.__dict__: d = cls.__dict__['__dict__'] if inspect.isgetsetdescriptor(d) or inspect.ismemberdescriptor(d): size += get_size(obj.__dict__, seen) break if isinstance(obj, dict): size += sum((get_size(v, seen) for v in obj.values())) size += sum((get_size(k, seen) for k in obj.keys())) elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)): size += sum((get_size(i, seen) for i in obj)) return size 

Much is going on here! Most noteworthy is that it contains all the objects that he saw in the set for resolving circular links. If you delete this line, in each case there will not be so much memory.

  1. First of all, this behavior is highly dependent on whether you are using CPython or something else. As for CPython, this can happen because it is not always possible to immediately return memory to the OS.

Here's a good article on this topic, citing:

If you create a large object and delete it again, Python probably released the memory, but the memory allocators involved do not have to return the memory to the operating system to make it look like the Python process uses much more virtual memory than it actually uses.

+3
source share
  • Why should the process consume overheads of more than 8 M?
  • Garbage collection does not necessarily occur immediately. See documentation :

Objects are never explicitly destroyed; however, when they become inaccessible, they can be collected in garbage. Implementation made it possible to postpone garbage collection or omit it altogether - this is a matter of implementation quality, like garbage collection, provided that all objects are not collected achievable.

Implementation details of CPython: CPython currently uses a link counting scheme with (optionally) delayed detection of cyclically linked garbage that collects most objects as soon as they become unreachable, but garbage collection containing circular references is not guaranteed. See the gc module documentation for information on controlling cyclic garbage collection. Other implementations act differently, and CPython may change. Do not depend on the immediate completion of objects when they become inaccessible (so you should always close files explicitly).

+3
source share

All Articles