How to convert Python stream code to multiprocessor code?

I need to convert a threading application to a multiprocessing application for several reasons (GIL, memory leak). Fortunately, the threads are pretty isolated and communicate only through Queue.Queue s. This primitive is also available in multiprocessing , so everything looks great. Now, before I enter this minefield, I would like to get advice on upcoming problems:

  • How to ensure the transfer of my objects through Queue ? Should I provide some __setstate__ ?
  • Can I rely on put returning instantly (e.g. using threading Queue s)?
  • General tips / tricks
  • Anything worth reading separately from the Python documentation ?
+4
source share
1 answer

Answer to part 1:

Everything that should go through multiprocessing.Queue (or Pipe or something else) should be picklable . This includes basic types such as tuple s, list and dict s. Classes are also supported if they are upper and not too complex (check the details). However, trying to run lambda around will fail.

Answer to part 2:

A put consists of two parts: a semaphore is required to change the queue, and it probably starts the feeder stream. Therefore, if no other Process is trying to put to the same Queue at the same time (for example, because there is only one Process for it), it should be fast. For me, this turned out to be fast enough for all practical purposes.

Partial answer to part 3:

  • The usual multiprocessing.queue.Queue lacks the task_done method, so it cannot be used as a replacement for a replacement. (A subclass provides a method.)
  • The old processing.queue.Queue lacks the qsize method, and the newer version of multiprocessing inaccurate (just keep that in mind).
  • Since filedescriptors are usually inherited on fork , care must be taken to close them in the correct processes.
+5
source

All Articles