How to prioritize (or set a scheduling policy for) the manager and workflow threads?

I run a process (on Linux 3.x based OS) in which:

  • Multiple threads are “manager” threads (for simplicity, they are assumed to make decisions about which workflows should do something, but they don’t do any I / O and the amount of processor time required is generally shorter / much shorter than workflows ")
  • More threads are “worker” threads: they make a heavy lift, and I have no problem unloading them anytime.

It is possible that there is an oversubscription (i.e. more workflows than twice as many cores on an Intel processor with HT). Now I see that the “manager” threads often do not get processor time. They did not completely “starve”, I just want to give them a boost. So, of course, I thought about setting thread priorities (I'm on Linux), but then I noticed different options for thread schedulers and their effect. At the moment, I was embarrassed, or rather - this is not clear to me:

  • What planning policy should I choose for managers and which for workers?
  • Why do I need to prioritize threads (if at all)?
  • Do I sometimes need to return my threads ()?

Notes:

  • I intentionally do not say anything about the mechanism of a pool of languages ​​or threads. I want to ask this question in more general settings.
  • Do not make assumptions about CPU cores. There may be many of them, or maybe only one, and maybe I need workers (or workers and managers) on each core.
  • Worker threads may or may not perform I / O. However, answers in case they do not do I / O are welcome.
  • I really don't need a system to be very responsive other than launching my application. I mean, I would rather have SSH there, and I would type the text for me without significant delay, but there were no real restrictions there.
+5
source share
3 answers

UPD 02/12/2015 . I did some experiments.

Theory

There is an obvious decision to change the "scheduler" thread scheduler to RT (a real-time scheduler that provides SCHED_DEADLINE / SCHED_FIFO policies). In this case, the “manager” threads will always have a higher priority than most threads in the system, so they will almost always get the processor when they need it.

However, there is another solution that allows you to stay on the CFS scheduler. Your description of the purpose of the "work" flows is similar to batch planning (in ancient times, when computers were large, the user must queue his work and wait for hours before it is completed). Linux CFS supports batch jobs through the SCHED_BATCH policy and dialog jobs through the SCHED_NORMAL policy.

There is a useful comment in the kernel code ( kernel / sched / fair.c ):

/* * Batch and idle tasks do not preempt non-idle tasks (their preemption * is driven by the tick): */ if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION)) return; 

Therefore, when the “manager” thread or some other event is activated by the “worker”, the processor only receives the latter if the system has free processors or when the “manager” runs out of its time interval (to configure it by changing the weight of the task) .

It seems that your problem cannot be solved without changing the scheduler policies. If the "worker" threads are very busy, and the "manager" rarely wake up, they will receive the same vruntime bonus, so the "worker" will always supersede the "control" threads (but you can increase their weight so that they exhaust their bonus faster).

Experiment

I have a server with 2 x Intel Xeon E5-2420 processors that gives us 24 hardware threads. To simulate two thread pools, I used my own TSLoad workload generator (and fixed a couple of errors when doing experiments :)).

There were two thread pools: tp_manager with 4 threads and tp_worker with 30 threads and busy_wait workloads (just for(i = 0; i < N; ++i); ), but with a different number of loop cycles. tp_worker works in benchmark mode, so it will run as many requests as it can, and takes up 100% of the CPU.

Here is a config example: https://gist.github.com/myaut/ad946e89cb56b0d4acde

3.12 (vanilla with debug configuration)

 EXP | MANAGER | WORKER | sched wait service | sched service | policy time time | policy time 33 | NORMAL 0.045 2.620 | WAS NOT RUNNING 34 | NORMAL 0.131 4.007 | NORMAL 125.192 35 | NORMAL 0.123 4.007 | BATCH 125.143 36 | NORMAL 0.026 4.007 | BATCH (nice=10) 125.296 37 | NORMAL 0.025 3.978 | BATCH (nice=19) 125.223 38 | FIFO (prio=9) -0.022 3.991 | NORMAL 125.187 39 | core:0:0 0.037 2.929 | !core:0:0 136.719 

3.2 (Debian stock)

 EXP | MANAGER | WORKER | sched wait service | sched service | policy time time | policy time 46 | NORMAL 0.032 2.589 | WAS NOT RUNNING 45 | NORMAL 0.081 4.001 | NORMAL 125.140 47 | NORMAL 0.048 3.998 | BATCH 125.205 50 | NORMAL 0.023 3.994 | BATCH (nice=10) 125.202 48 | NORMAL 0.033 3.996 | BATCH (nice=19) 125.223 42 | FIFO (prio=9) -0.008 4.016 | NORMAL 125.110 39 | core:0:0 0.035 2.930 | !core:0:0 135.990 

Some notes:

  • Time in milliseconds
  • The last experiment is designed to determine affinity (recommended by @ PhilippClaßen): manager threads are bound to Core # 0, while worker threads are bound to all cores except Core # 0.
  • Maintenance time for manager flows has doubled, due to concurrency by internal cores (the processor has Hyper-Threading!)
  • Using SCHED_BATCH + nice (TSLoad cannot set direct weight, but nice can do this indirectly) slightly reduces the latency.
  • The negative wait time in the SCHED_FIFO experiment is OK: TSLoad reserves 30us, so it can do preliminary work / time scheduler to make a context switch / etc. SCHED_FIFO seems to be very fast.
  • Reserving a single core is not so bad, and since it is removed in the concurrency core, maintenance time has been significantly reduced.
+7
source

In addition to my answer, you can also bind the manager to specific CPUs ( sched_setaffinity ), and the rest to others. Depending on your particular use case, which can be very wasteful, of course.

Link: Thread binds the processor core

Explicit income is usually not needed, in fact it is often not recommended. To quote Robert Love in "Linux System Programming":

In practice, there are several legitimate uses of sched_yield () on an appropriate proactive multitasking system such as Linux. The kernel is fully capable of making the best and most effective planning decisions - of course, the kernel is better equipped than a standalone application to decide what needs to be anticipated and when.

The exception that he mentions is that you expect external events, for example, caused by a user, equipment, or other process. This is not the case in your example.

+2
source

Adding to my excellent answer is to consider using a kernel with the CONFIG_PREEMPT_RT patch set installed. This leads to some rather severe changes in how the kernel performs planning, and the result is that latency of planning becomes much more deterministic.

Used in conjunction with the correct correlation of thread priorities (managers> employees) with any of myaut's suggestions (and especially with SCHED_FIFO), which gives very good results.

+2
source

Source: https://habr.com/ru/post/1211353/


All Articles