How to handle I / O lock in mod_wsgi / django?

I run Django under Apache + mod_wsgi in daemon mode with the following configuration:

WSGIDaemonProcess myserver processes=2 threads=15 

My application performs several I / O operations on the server, which may take several seconds.

 def my_django_view: content=... # Do some processing on backend file return HttpResponse(content) 

It seems that if I process more than two HTTP requests that process this type of I / O, Django will simply block until one of the previous requests completes.

Is this the expected behavior? Should I help with streaming so that I cannot process up to 15 separate requests for a given WSGI process before I see this kind of wait?

Or am I missing something?

+4
source share
3 answers

If the processing is done in python, then Global Interpreter Lock will not be released - in one python process only one stream of python code can be executed. GIL is usually issued inside C code, although, for example, like most I / O operations.

If this type of processing will take place a lot, you can think of launching a second β€œworking” application as a deamon, reading tasks from the database, performing operations, and writing the results back to the database. Apache may decide to kill processes that take too long to respond.

+2
source

+1 to the answer of Radomir Dopiralsky.

If the task takes a lot of time, you must delegate it to the process outside the request-request cycle, either using the standard cron or in some distributed task queue, for example Celery

+2
source

The bases for unloading the workload were pretty good in 2010, and then a good idea, but we have already made some progress.

We use Apache Kafka as a queue to store our in-flight workload. So Dataflow is now:

User -> Apache httpd -> Kafka -> python daemon processor

The user operation sends data to the system for processing through the wsgi application, which simply quickly writes it to the Kafka queue. A minimal health check is performed during the operation in order to maintain it quickly, but to detect some obvious problems. Kafka stores data very quickly, so the HTTP response is zippy.

A separate set of python daemons extracts data from Kafka and processes it. We actually have several processes that need to handle it differently, but Kafka does it quickly, only writing once and having several readers, reads the same data if necessary; no punishment for duplicate storage is incurred.

This allows a very, very fast turn; optimal use of resources, since we have other offline boxes that handle pull-from-kafka and can configure this to reduce the lag as needed. Kafka is HA with the same data written to several mailboxes in the cluster, so my manager does not complain about the "what happens if" scenario.

We are pleased with Kafka. http://kafka.apache.org

0
source

All Articles