Correct garbage collection in expansion modules

Two sections of the Python 2.7 document mention the addition of support for circular garbage collection (CGC) for container objects defined in extension modules.

The Python / C API Reference Guide provides two rules, i.e.

  • The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar() .
  • After all the fields that may contain links to other containers are initialized, it should call PyObject_GC_Track() .

While in Extending and embedding the Python interpreter for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and populating tp_traverse and tp_clear would be enough to enable CGC support. And the two rules above do NOT apply in practice.

When I modified the Noddy example to actually follow the rules PyObject_GC_New() / PyObject_GC_Del() and PyObject_Track() / PyObject_GC_UnTrack() , he unexpectedly raised an assertion error, saying:

Modules / gcmodule.c: 348: visit_decref: statement "gc-> gc.gc_refs! = 0" failed. refcount was too small

This leads to my confusion about the correct / safe way to implement CGC. Can anyone give some advice or, preferably, a neat example of a container object with CGC support?

+6
source share
2 answers

In most normal cases, you do not need to do tracking / tracking yourself. This is described in the documentation, but this is not done clearly. In the case of the Noddy example, you definitely do not.

The short option is that TypeObject contains two function pointers: tp_alloc and tp_free . By default, tp_alloc calls all the correct functions when creating the class (if Py_TPFLAGS_HAVE_GC set), and tp_free does not check the class when it is destroyed.

Noddy documentation says (at the end of the section):

This is pretty much the case. If we wrote custom tp_alloc or tp_free , you must change them to collect circular garbage. Most extensions will use the provided versions automatically.

Unfortunately, one place that makes it hard to understand that you don’t need to do this is the circular garbage collection documentation .


detail:

Nodds are allocated using the Noddy_new function placed in tp_new slots. According to the documentation , the main thing that the β€œnew” function should execute is to call tp_alloc . Usually you do not write tp_alloc yourself, but by default PyType_GenericAlloc() .

Looking at PyType_GenericAlloc() in a Python source , a series of sections are displayed where it changes based on PyType_IS_GC(type) . First it calls _PyObject_GC_Malloc instead of PyObject_Malloc , and then it calls _PyObject_GC_TRACK(obj) . [Note that all PyObject_New really does is call PyObject_Malloc and then tp_init .]

Similarly, upon release, you call tp_free slot , which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC . PyObject_GC_Del contains code that does the same thing as PyObject_GC_UnTrack , so a call to check is not needed.

+1
source

I myself am not very experienced in the C API to give you any advice, but there are enough examples in the Python container implementations themselves.

Personally, I start by implementing the tuple first, since it is immutable: Objects / tupleobject.c . Then go to the dict , list and set implementations for further notes in mutable containers:

I cannot help but notice that there are calls to PyObject_GC_New() , PyObject_GC_NewVar() and PyObject_GC_Track() all over, as well as setting Py_TPFLAGS_HAVE_GC .

+3
source

Source: https://habr.com/ru/post/924482/


All Articles