Is the .defaultdict collector thread safe?

I did not work with threads in Python at all and asked this question as a complete stranger.

I am wondering if defaultdict is thread safe. Let me explain this:

I have

 d = defaultdict(list) 

which by default creates a list of missing keys. Let's say I have several threads started doing this at the same time:

 d['key'].append('value') 

In the end, I should get ['value', 'value'] . However, if defaultdict not thread safe, if thread 1 gives thread 2 after checking if 'key' in dict and before d['key'] = default_factory() , this will cause striping and another thread will create a list in d['key'] and will add 'value' possibly.

Then, when thread 1 runs again, it will continue with d['key'] = default_factory() , which will destroy the existing list and value, and we will end in ['key'] .

I looked at the source code of CPython for defaultdict . However, I could not find any castles or mutexes. I assume it is not thread safe if it is documented like this.

Some guys at IRC said last night that Python has a GIL, so it is conceptually thread safe. Some of the threads mentioned should not be executed in Python. I'm pretty confused. Ideas?

+7
python defaultdict python-collections
source share
1 answer

In this particular case, it is thread safe.

To find out why it's important to understand when Python switches threads. CPython allows you to switch between threads between Python bytecode steps. This is where the GIL arrives; each N-byte code instruction releases the lock and a stream switch may occur.

Code d['key'] processed with one bytecode ( BINARY_SUBSCR ), which runs the .__getitem__() method, which will be called in the dictionary.

A defaultdict , configured with list as the factory default, fully handles the dict.__getitem__() method in C, and the GIL never unlocks, making the dict[key] streaming search safe.

Pay attention to qualifications; if you create a defaultdict instance with a different factory default that uses Python code (for example, lambda: [1, 2, 3] ), all bets are disabled, as this means that the C code calls the Python code, and the GIL may freed again when bytecode is executed for the lambda function. Similarly, if the factory is written in C code that explicitly releases the GIL, thread switching can occur, and thread safety exits the window.

+12
source share

All Articles