Python memory management Reading objects of various sizes from OODB

I read in a collection of objects (tables, such as sqlite3 or dataframes tables) from an object-oriented database, most of which are small enough so that the Python garbage collector can handle without incident. However, when they become larger (less than 10 MB), the GC does not seem to be able to keep up.

psuedocode is as follows:

walk = walkgenerator('/path') objs = objgenerator(walk) with db.transaction(bundle=True, maxSize=10000, maxParts=10): oldobj = None oldtable = None for obj in objs: currenttable = obj.table if oldtable and oldtable in currenttable: db.delete(oldobj.path) del oldtable oldtable = currenttable del oldobj oldobj = obj if not count % 100: gc.collect() 

I am looking for an elegant way to manage memory, letting Python handle it whenever possible.

Perhaps embarrassingly, I tried using del to help clear the reference count.

I tried gc.collect () with a variable number of modules in my for loops:

  • 100 (no difference),
  • 1 (the slow cycle is quite a lot, and I still get a memory error of some type),
  • 3 (loop still slow but memory explodes anyway)

Suggestions are welcome.

In particular, if you can give me tools to help with introspection. I used Windows Task Manager here and it seems more or less random spring memory leak. I have limited the size of the transaction as much as I feel comfortable, and that helps a bit.

+7
python garbage-collection memory
source share
1 answer

There is not enough information to say a lot, but what I can say does not fit into the comment, so I will post it here; -)

First and foremost, CPython's garbage collection mainly uses reference counting. gc.collect() do nothing for you (except for recording time) if garbage objects are not involved in the reference cycles (object A can be reached from itself by following a chain of pointers transitively reachable from A ). You do not create reference loops in the code you specified, but perhaps the database layer.

So, after running gc.collect() , is memory used at all? If not, running it is pointless.

I expect that, most likely, at the database level, references to objects are stored longer than necessary, but for this you need to delve into the details of how the database level is implemented.

One way to get hints is to print the result of sys.getrefcount() applied to various large objects:

 >>> import sys >>> bigobj = [1] * 1000000 >>> sys.getrefcount(bigobj) 2 

As the docs say, the result is usually larger than you might expect, because the getrefcount() argument is temporarily increased by 1 simply because it is used (temporarily) as an argument.

So, if you see refcount greater than 2, del will not free the object.

Another way to get hints is to pass the object to gc.get_referrers() . This returns a list of objects that are directly related to the argument (provided that the referrer participates in the gc loop).

By the way, you need to be clearer about what you mean by the word "doesn't seem to work" and "explodes in the end." I can’t guess. What exactly is going wrong? For example, is a MemoryError raised? Something other? Traebacks often give a world of helpful tips.

+5
source share

All Articles