I am creating a task queue based application: it serves a number of tasks for several asynchronously connected clients. The twist is that tasks must be performed at random .
My problem is that the algorithm that I am using now is expensive computational because it relies on many large queries and translations from the database. I have a strong hunch that there is a cheaper way to achieve the same result, but I can not understand the solution. Can you come up with a smart solution to this problem?
Here is the (computationally expensive) algorithm I am using now:
When a client requests a new task ...
- Request a database for incomplete tasks
- Put all tasks in a list
- Shuffle the list (using random.shuffle)
- Mark first task as "in progress"
- Send task parameters to the client to complete
When the client completes the task ...
6a. Record the result and mark the task as βcompletedβ.
If the client does not complete the task in a certain time ...
6b. Repeat the task flag as incomplete.
It looks like we could have done better by replacing steps 1, 2, and 3 using pseudo-random sequences or hash functions. But I can not understand the whole solution. Ideas?
Other considerations:
- In case this is important, I use python and mongodb for all this. (Mongodb doesn't have some kind of smart "find_one function" to make good use of the random use match, doesn't it?)
- The term queue is a little misleading. All tasks are stored in subfields of one collection in mongodb. The length (total number of tasks) in the collection is known and fixed from the very beginning.
- If necessary, it may be allowed to assign the same task several times, while this is rare. But instances of this kind should be very rare, because the execution of each task is expensive.
- I have information about each client, so we know exactly who takes on each task request.
source share