Before we run into big problems, you are not using Queue.join correctly.
All this function is that the manufacturer, who adds a bunch of elements to the queue, can wait until the consumer or consumers finish work on all these elements. This works by calling the task_done consumer after completing work on each item they removed using get . When there were so many task_done calls as put calls, the queue is executed. You do not get anywhere, task_done , so the queue cannot be completed. So why do you block forever after the completion of two threads.
The first problem is that your threads practically do not work outside of real synchronization. If the only thing they do is fight for the line, only one of them will be able to work at a time.
Of course, this is common in toy problems, but you need to think about your real problem:
- If you do a lot of I / O (listening on sockets, waiting for user input, etc.), the threads work fine.
- If you work a lot with the CPU (calculating prime numbers), the threads do not work in Python due to the GIL, but the processes are executed.
- If you are mainly engaged in synchronizing individual tasks, no one will work well (and the processes will be worse). It is still easier to think in terms of threads, but this will be the slowest way to do something. You can look into coroutines; Greg Ewing has a great demo on how to use
yield from to use coroutines to create things like schedulers or multi-actor simulations.
Further, as I mentioned in the previous question, in order for threads (or processes) to work effectively with a common state, blocking lock is required as short as possible.
So, if you need to search the whole queue under the lock, it would be better to search by constant time, rather than search by linear time. Therefore, I suggested using something like the OrderedSet recipe, rather than list , for example, inside stdlib Queue.Queue . Then this function:
def is_in_queue(x, q): with q.mutex: return x in q.queue
... blocks the queue only for a small part of a second - long enough to find the hash value in the table, and not long enough to compare each element in the queue with x .
Finally, I tried to explain the race conditions on your other question, but let me try again.
You need to block every complete transaction in the code, not just individual transactions.
For example, if you do this:
with queue locked: see if x is in the queue if x was not in the queue: with queue locked: add x to the queue
... then it is always possible that x was not in the queue when you checked, but at the time you unlocked it and restarted, someone added it. That is why it is possible that both threads will be early.
To fix this, you need to put a lock around everything:
with queue locked: if x is not in the queue: add x to the queue
Of course, this goes directly against what I said earlier about blocking the queue as short as possible. Indeed, this is what makes multithreading tough in a nutshell. It is easy to write safe code that simply blocks everything for as long as it may be necessary, but then your code ends with only one core, and all other threads are blocked waiting for blocking. And itβs easy to write fast code that simply blocks everything as short as possible, but then it is unsafe and you get garbage values ββor even damage everywhere. Having figured out what should be a transaction, and how to minimize the work inside these transactions and how to deal with multiple locks, you probably need to do this work without deadlock ... it's not that simple.