Preventing closing TextIOWrapper on GC in a Py2 / Py3 compatible way

What do I need to do:

Given the binary, decode it in several ways by providing the TextIOBase API. Ideally, these subsequent files can be transferred without my need to clearly track their lifespan.

Unfortunately, packaging BufferedReader will cause the reader to close when TextIOWrapper goes beyond.

Here is a simple demonstration of this:

 In [1]: import io In [2]: def mangle(x): ...: io.TextIOWrapper(x) # Will get GCed causing __del__ to call close ...: In [3]: f = io.open('example', mode='rb') In [4]: f.closed Out[4]: False In [5]: mangle(f) In [6]: f.closed Out[6]: True 

I can fix this in Python 3 by overriding __del__ (this is a reasonable solution for my use case, since I have full control over the decoding process, I just need to expose a very uniform API at the end):

 In [1]: import io In [2]: class MyTextIOWrapper(io.TextIOWrapper): ...: def __del__(self): ...: print("I've been GC'ed") ...: In [3]: def mangle2(x): ...: MyTextIOWrapper(x) ...: In [4]: f2 = io.open('example', mode='rb') In [5]: f2.closed Out[5]: False In [6]: mangle2(f2) I've been GC'ed In [7]: f2.closed Out[7]: False 

However, this does not work in Python 2:

 In [7]: class MyTextIOWrapper(io.TextIOWrapper): ...: def __del__(self): ...: print("I've been GC'ed") ...: In [8]: def mangle2(x): ...: MyTextIOWrapper(x) ...: In [9]: f2 = io.open('example', mode='rb') In [10]: f2.closed Out[10]: False In [11]: mangle2(f2) I've been GC'ed In [12]: f2.closed Out[12]: True 

I spent a bit of time looking at the Python source code and it looks surprisingly similar between 2.7 and 3.4, so I don’t understand why __del__ inherited from IOBase is not overridden in Python 2 (or even visible in dir ), but it still seems accomplished. Python 3 works exactly as expected.

Is there anything I can do?

+7
python garbage-collection
source share
4 answers

Just detach your TextIOWrapper() before allowing its garbage collection:

 def mangle(x): wrapper = io.TextIOWrapper(x) wrapper.detach() 

The TextIOWrapper() object closes only the streams to which it is attached. If you cannot change the code where the object goes out of scope, just save the link to the TextIOWrapper() object locally and disconnect it at that point.

If you need TextIOWrapper() subclass of TextIOWrapper() , just call detach() on __del__ :

 class DetachingTextIOWrapper(io.TextIOWrapper): def __del__(self): self.detach() 
+1
source share

EDIT:

Just call detach first, thanks martijn-pieters!


It turns out, in principle, nothing can be done with a deconstructor that calls close in Python 2.7. This is hard-coded into C code. Instead, we can change close so that it will not close the buffer when __del__ happens ( __del__ will be executed before _PyIOBase_finalize in the C code gives us the opportunity to change the behavior of close ). This allows close operation as expected, preventing the GC from closing the buffer.

 class SaneTextIOWrapper(io.TextIOWrapper): def __init__(self, *args, **kwargs): self._should_close_buffer = True super(SaneTextIOWrapper, self).__init__(*args, **kwargs) def __del__(self): # Accept the inevitability of the buffer being closed by the destructor # because of this line in Python 2.7: # https://github.com/python/cpython/blob/2.7/Modules/_io/iobase.c#L221 self._should_close_buffer = False self.close() # Actually close for Python 3 because it is an override. # We can't call super because Python 2 doesn't actually # have a '__del__' method for IOBase (hence this # workaround). Close is idempotent so it won't matter # that Python 2 will end up calling this twice def close(self): # We can't stop Python 2.7 from calling close in the deconstructor # so instead we can prevent the buffer from being closed with a flag. # Based on: # https://github.com/python/cpython/blob/2.7/Lib/_pyio.py#L1586 # https://github.com/python/cpython/blob/3.4/Lib/_pyio.py#L1615 if self.buffer is not None and not self.closed: try: self.flush() finally: if self._should_close_buffer: self.buffer.close() 

In my previous solution, here I used _pyio.TextIOWrapper which is slower than the above, because it is written in Python and not in C.

It included simply overriding __del__ with noop, which would also work in Py2 / 3.

+3
source share

A simple solution would be to return the variable from the function and save it in the script area so that it does not collect garbage until the script ends or the link to it changes. But there may be other elegant solutions.

0
source share

EDIT:

I have found a much better solution (comparatively), but I will leave this answer in the event that it will be useful to everyone who studies. (This is a pretty simple way to show gc.garbage )

Do not actually use what follows.

OLD:

I found a potential solution, although this is terrible:

What we can do is set up a circular reference in the destructor that will hold the GC event. Then we can look at garbage of gc to find these invalid objects, break the loop and drop this link.

 In [1]: import io In [2]: class MyTextIOWrapper(io.TextIOWrapper): ...: def __del__(self): ...: if not hasattr(self, '_cycle'): ...: print "holding off GC" ...: self._cycle = self ...: else: ...: print "getting GCed!" ...: In [3]: def mangle(x): ...: MyTextIOWrapper(x) ...: In [4]: f = io.open('example', mode='rb') In [5]: mangle(f) holding off GC In [6]: f.closed Out[6]: False In [7]: import gc In [8]: gc.garbage Out[8]: [] In [9]: gc.collect() Out[9]: 34 In [10]: gc.garbage Out[10]: [<_io.TextIOWrapper name='example' encoding='UTF-8'>] In [11]: gc.garbage[0]._cycle=False In [12]: del gc.garbage[0] getting GCed! In [13]: f.closed Out[13]: True 

In truth, this is a pretty terrifying workaround, but it can be transparent to the API I am delivering. However, I would prefer a way to override __del__ IOBase .

0
source share

All Articles