Safely repeating WeakKeyDictionary and WeakValueDictionary

The Python 3.2 module documentation weakref WeakKeyDictionary and WeakValueDictionary have a note on iterating over these containers:

Note. . Attention! Since WeakKeyDictionary is built on top of the Python dictionary, it should not be resized upon repetition. This can be difficult to achieve for WeakKeyDictionary, since the actions performed by the program during the iteration can cause the elements in the dictionary to disappear โ€œby magicโ€ (as a side effect of garbage collection).

It looks pretty heavy, like a specification of the behavior of these containers. Especially when running code that uses the CPython garbage collector (when using data structures containing a loop) or using another Python implementation (like Jython), then it sounds like there is no safe way to repeat these collections.

How can I safely iterate over these collections when the garbage collector can clear links at any time in my program? The solution for CPython is my priority, but I'm interested in the problem for other implementations as well.

Perhaps this is a safe way to iterate over WeakKeyDictionary?

 import weakref d = weakref.WeakKeyDictionary() ... for k, v in list(d.items()): ... 
+7
source share
4 answers

It is actually safe to WeakKeyDictionary over WeakKeyDictionary , WeakValueDictionary or WeakSet in Python 2.7 or Python 3. 1+. They establish protection against iterations, which does not allow callbacks with weak links to remove links from the base query or to set them during the iteration back in 2010, but the documents have not been updated.

With protection, if a record dies before iteration reaches iteration, the iteration will skip that record but will not result in segfault, RuntimeError, or anything else. Dead entries will be added to the pending deletion list and processed later.

Here's the guard (not thread safe, despite the comment):

 class _IterationGuard: # This context manager registers itself in the current iterators of the # weak container, such as to delay all removals until the context manager # exits. # This technique should be relatively thread-safe (since sets are). def __init__(self, weakcontainer): # Don't create cycles self.weakcontainer = ref(weakcontainer) def __enter__(self): w = self.weakcontainer() if w is not None: w._iterating.add(self) return self def __exit__(self, e, t, b): w = self.weakcontainer() if w is not None: s = w._iterating s.remove(self) if not s: w._commit_removals() 

This is where the WeakKeyDictionary weak reflex checks for protection :

 def remove(k, selfref=ref(self)): self = selfref() if self is not None: if self._iterating: self._pending_removals.append(k) else: del self.data[k] 

And this is where WeakKeyDictionary.__iter__ sets the guard :

 def keys(self): with _IterationGuard(self): for wr in self.data: obj = wr() if obj is not None: yield obj __iter__ = keys 

The same guardian is used in other iterators.


If this guard does not exist, list(d.items()) calls to list(d.items()) also not be safe. GC transfer can occur inside the items iterator and remove items from the dict during iteration. (The fact that list written in C will not provide any protection.)


In 2.6 and earlier versions, the safest way to iterate over WeakKeyDictionary or WeakValueDictionary would be to use items . items will return a list and it will use the basic dict items method, which (basically?) will not be interrupted by the GC. Changes to the dict API in 3.0 changed the operation of keys / values / items , which is why this may be why protection was introduced.

+3
source

To be safe, you must store the link somewhere. The use of idioms:

 for k,v in list(d.items()): 

Itโ€™s not completely safe, because although it will work most of the time, during the last iteration of the loop, the list can be collected using garbage.

The right way:

 items = list(d.items()) for k,v in items: #do stuff that doesn't have a chance of destroying "items" del items 

If you use WeakKeyDictionary , you can just save the keys and save the values โ€‹โ€‹if you use WeakValueDictionary .

On the side note: in python2 .items() list is already returned.

Ultimately, it depends on what you mean by โ€œsafe.โ€ If you simply mean that the iteration will act correctly (repeated once on all elements), then:

 for k,v in list(d.items()): 

is safe because dictionary iteration is actually done by list(d.items()) , then you only iterate through the list.

If you instead mean that during the iteration the elements should not โ€œdisappearโ€ from the dictionary as a side effect of for -loop, then you should keep the strong link until the end of the loop, and you need to save the list in a variable before starting the loop.

+7
source

Convert to strong links without using iteration.

 items = [] while d: try: items.append(d.popitem()) except KeyError: pass 

If it loses some keys during a while loop, this should not cause problems.

Then you can iterate over items . When you're done, d.update(items) to put them back, and then del items .

+1
source

Disable the garbage collector.

 import gc gc.disable() try: items = list(d.items()) finally: gc.enable() 

Then iterate the items .

0
source

Source: https://habr.com/ru/post/925251/


All Articles