The bases for unloading the workload were pretty good in 2010, and then a good idea, but we have already made some progress.
We use Apache Kafka as a queue to store our in-flight workload. So Dataflow is now:
User -> Apache httpd -> Kafka -> python daemon processor
The user operation sends data to the system for processing through the wsgi application, which simply quickly writes it to the Kafka queue. A minimal health check is performed during the operation in order to maintain it quickly, but to detect some obvious problems. Kafka stores data very quickly, so the HTTP response is zippy.
A separate set of python daemons extracts data from Kafka and processes it. We actually have several processes that need to handle it differently, but Kafka does it quickly, only writing once and having several readers, reads the same data if necessary; no punishment for duplicate storage is incurred.
This allows a very, very fast turn; optimal use of resources, since we have other offline boxes that handle pull-from-kafka and can configure this to reduce the lag as needed. Kafka is HA with the same data written to several mailboxes in the cluster, so my manager does not complain about the "what happens if" scenario.
We are pleased with Kafka. http://kafka.apache.org
source share