Multiprocessor Job Scheduling Algorithm

I am curious to find out if there is a widespread solution for managing stream resources in a stream pool considering the following scenarios / limitations:

  • Incoming jobs are all of the same nature and can be processed by any thread in the pool.
  • Incoming tasks will be "posted" in different queues based on some attribute of the incoming task, so that all tasks moving to the same bucket / queue MUST be processed sequentially.
  • Some buckets will be less busy than others at different points during the life of the program.

My question is about threadpool implementation theory. What algorithm can be used to efficiently distribute available threads for incoming tasks in all buckets?

Change Another design goal is to eliminate as much latency as possible between a job in the queue and picking it up for processing, subject to the availability of free threads.

Edit2 . In that case, if I think about the presence of a relatively large number of queues (50-100), which have unpredictable levels of activity, but probably only 25% of them will be active at any given time.

The first (and most expensive) solution that I can think of is to simply have 1 thread assigned to each queue. Although this will ensure that incoming requests are received immediately, it is clearly inefficient.

The second solution is to combine the queues based on the expected activity levels so that the number of queues is built into the number of threads in the pool, allowing one thread to be assigned to each queue. The problem here is that incoming jobs that might otherwise be processed in parallel will have to wait for each other.

The third solution is to create the maximum number of queues, one for each set of tasks that must be processed sequentially, but to distribute threads only by the number of waiting to wait at any given time (which can also be configured by the pool at runtime). Therefore, my question arises here: given that we have more queues than threads, how does the pool go about how to most efficiently distribute thread threads to incoming jobs?

I would like to know if there is a generally accepted approach. Or, if there are different approaches - who uses which? What are the advantages / disadvantages etc.?

Edit3 : this can be best expressed in pseudo-code.

+7
source share
3 answers

You should probably eliminate nr. 2 from your specification. All you really need to accomplish is that the threads occupy the buckets and process the queues inside the buckets in order. It makes no sense to process a serialized queue using another thread pool or serialize tasks in parallel. Thus, your specification simply becomes that threads iterate fifo in buckets, and this allows the pool manager to insert correctly constructed buckets. So your bucket will be:

struct task_bucket { void *ctx; // context relevant data fifo_t *queue; // your fifo }; 

Then you need to make threadpool smart enough to know what to do at each iteration of the queue. For example, ctx may be a function pointer, and the queue may contain data for this function, so the workflow simply calls the function at each iteration with the data provided.

Reflecting comments: If the size of the bucket list is known in advance and is unlikely to change throughout the life of the program, you need to find out if this is important to you. You will need some way for the threads to select the bucket that you need to take. The easiest way is to have a FIFO queue that is populated by the manager and freed by threads. The classic reader / writer.

Another possibility is a bunch. The worker removes the highest priority from the heap and processes the bucket queue. Both deletion by workers and insertion by the manager reorder the heap, so the root of the node is the highest priority.

Both of these strategies assume that workers throw buckets and the manager makes new ones.

If basket management is important, you risk that workers will only visit the last modified task, so the manager will either have to reorder the bucket list or change the priorities of each bucket, and the iteration worker will look for the highest priority. It is important that the ctx memory remains up to date when the threads work , and threads should also copy this. Workers can simply assign the queue locally and set the queue to NULL in the bucket.

+2
source

ADDED: Now I tend to agree that you can simply and easily save a separate stream for each bucket, and only if this simple solution has problems, are you looking for something else. And the best solution may depend on what kind of problems a simple one causes.

In any case, I leave my original answer below, added with a belated thought.


You can create a special global queue "the task is available in the form of a bucket X".

All idle workers will wait in this queue, and when the signal is placed in the queue, one thread will take it and go to the appropriate bucket to process the jobs there until the bucket becomes empty.

When an incoming job is sent to the slave, it should be checked to see if a workflow has already been assigned to this bucket. If assigned, a new job will ultimately be processed by this workflow, so no signal should be sent. If no worker is assigned, check if it is empty or not. If empty, put the signal in the global signal queue so that a new job arrives in this bucket; if not empty, such a signal should have already been made, and the worker thread should arrive soon, so do nothing.

ADDED: I thought that my idea above could cause hunger for some tasks if the number of threads is less than the number of β€œactive” buckets and there is an endless stream of incoming tasks. If all threads are already taken and a new task arrives in a bucket that has not yet been submitted, it may take a long time before the thread is released to work on this new task. Therefore, it is necessary to check whether there are idle workers, and if not, create a new one ... which adds more complexity.

+2
source

Keep it simple: I would use 1 thread in the queue. Simplicity costs a lot and streams are pretty cheap. 100 threads will not be a problem for most operating systems.

Using the thread in the queue, you also get a real scheduler. If a thread blocks (depending on what you are doing), another thread may be queued. You will not get stuck until every block is blocked. The same cannot be said if you use fewer threads β€” if the queues show the threads as a serving unit, then even if the other queues are β€œstarted”, and even if this other queue can unlock blocked threads, you will have a dead end.

Now, in particular the scripts, using threadpool may be worth it. But then you are talking about optimizing a particular system, and the details matter. How expensive are the streams? How good is the planner? What about blocking? How long are the queues, how often are updated, etc.

Thus, in the general case, having only the information that you have about 100 queues, I would just go for the thread in the queue. Yes, there is some overhead: all solutions will have this. Streaming will cause synchronization problems and overhead. And the overhead of a limited number of threads is pretty negligible. You are mainly talking about 100 MB of address space - not necessarily in memory. If you know that most queues will be inactive, you can continue optimizing to stop threads in empty queues and start them when necessary (but beware of race conditions and beating).

0
source

All Articles