What does struct sched_domain mean in include / linux / sched.h (domain scheduling in the kernel)

I am trying to understand how load balancer works on a multiprocessor system in the Linux kernel,

The Linux scheduler mainly uses runques to store the tasks that it needs to perform next, now accepting the situation on a multiprocessor system as load_balancer (), the explanation given in the book by Robert Loves, in the book "Linux Kernel Development" next edition is implemented

First, load_balance () calls find_busiest_queue () to determine the busiest run. In other words, this is the largest number of processes in it. If there is no runqueue that has 25% or more processes than the current one, find_busiest_queue () returns NULL and load_balance () are returned. Otherwise, the busiest returned.

Secondly, load_balance () decides which priority array on the busiest one it wants to pull from. An expired array is preferable because these tasks have not been performed for a relatively long time, therefore, probably not in the processor cache (that is, they do not burn in the cache). If the array with the expired priority is empty, the only choice is active.

Then load_balance () finds the list of highest priority (lowest value) that has tasks, because it is more important to honestly prioritize tasks than priority ones.

Each task of this priority is analyzed to find a task that does not work, does not prevent migration through processor affinity, and not a hot cache. If the task meets these criteria, pull_task () is called pull the task from the most loaded runqueue to the current runqueue.

As long as the queues remain unbalanced, the previous two steps are repeated and other tasks are pulled from the busiest current ones. Finally, when the imbalance is resolved, the current runqueue is unlocked, and load_balance () is returned.

code is next

static int load_balance(int this_cpu, runqueue_t *this_rq, struct sched_domain *sd, enum idle_type idle) { struct sched_group *group; runqueue_t *busiest; unsigned long imbalance; int nr_moved; spin_lock(&this_rq->lock); group = find_busiest_group(sd, this_cpu, &imbalance, idle); if (!group) goto out_balanced; busiest = find_busiest_queue(group); if (!busiest) goto out_balanced; nr_moved = 0; if (busiest->nr_running > 1) { double_lock_balance(this_rq, busiest); nr_moved = move_tasks(this_rq, this_cpu, busiest, imbalance, sd, idle); spin_unlock(&busiest->lock); } spin_unlock(&this_rq->lock); if (!nr_moved) { sd->nr_balance_failed++; if (unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2)) { int wake = 0; spin_lock(&busiest->lock); if (!busiest->active_balance) { busiest->active_balance = 1; busiest->push_cpu = this_cpu; wake = 1; } spin_unlock(&busiest->lock); if (wake) wake_up_process(busiest->migration_thread); sd->nr_balance_failed = sd->cache_nice_tries; } } else sd->nr_balance_failed = 0; sd->balance_interval = sd->min_interval; return nr_moved; out_balanced: spin_unlock(&this_rq->lock); if (sd->balance_interval < sd->max_interval) sd->balance_interval *= 2; return 0; } 

What I don't understand is the structure above the sched_domain * sd code structure, this structure that I checked is defined in include / linux / sched.h as follows http://lxr.linux.no/linux+v3.7.1/ include / linux / sched.h # L895 is a big structure, so I just gave the link for simplicity. I want to know what is the use of struct sched_domain in the above code?

Why is this used when load_balancer () is called what this structure stands for?

there are few things. http://www.kernel.org/doc/Documentation/scheduler/sched-domains.txt why does the CPU need domain planning? What do these domains mean?

+4
source share
1 answer

Planning for domains and cpu scheduler / group groups helps simplify the process of scheduling tasks, such as:

  • load balancing tasks for each processor.
  • Choosing a processor to launch a new task.
  • choosing a processor to perform the sleep task when it wakes up.

It has an advantage twice:

  • He organizes the processor very well in the system in groups and hierarchies.

  • It organizes the processor in such a way that it is useful. All cpus that share the l2 cache belongs to the same domain. All cpus shared by l3 cache
    belong to a higher level domain that covers all domains that share the l2 cache.

The benefits that you see in a tree-like data structure are similar here to the benefits of domains and scheduler groups.

Refer to the following chart

  _________sd1________ / \ ---------------------- l3 cache ---------------------- --------- ---------- l2 cache l2 cache --------- ---------- cpu0 cpu1 cpu2 cpu3 \_______/ \________/ sd0 sd0 ________sd1_________ / \ ---------------------- l3 cache ---------------------- --------- ---------- l2 cache l2 cache --------- ---------- cpu4 cpu5 cpu6 cpu7 \_______/ \________/ sd0 sd0 

What you see above is the domain hierarchy of the scheduler. sd1 includes sd0s which are sd1.group schedulers. Every CPU has a domain hierarchy scheduler associated with it. For instance,
cpu0-> s.o. = sd0; sd0-> parent = sd1.This path through the linked list, we can iterate over all the scheduler domains to which the processor belongs.

How does this help?

balancing 1.load: say cpu0 is inactive and ready to pull tasks on to facilitate any other burdened processor. In the above approach, it first checks whether another processor belonging to the first level planning domain should be freed. Here, cpu1.If so it accepts tasks from cpu1, otherwise it goes to the top level domain sd1.If he chooses to transfer the task from cpu1, this is the best because the contents of the cache can be used, a common cache. No need to retrieve from memory again. This is the first advantage: schedule domains are formed on the basis of the advantages that the equipment should provide.

If he goes to sd1, he checks sd1 'groups', as sd0s.Here is the next advantage. He needs information about the schedule group alone and will not worry about the individual processor in it.it checks if load (sd0 [cpu2, cpu3])> load (sd0 [cpu0, cpu1]) only if this is true, does this mean again that cpu2 / 3 is more loaded. If there was no domain or scheduler groups, we would need to see the states of cpu2 and cpu3 in two iterations instead of the 1st iteration, as we are now.

Now we scale this problem and the solution for 128 processors! Imagine what a mess it would be if you had nothing to say which processor would best offload, in the worst case scenario, you would have to sort through all 128 processors.

But with a scheduler domain or groups, let's say you split 128 cpus into groups of 16 processors, you will have 8 groups. This is the busiest, so it will be 8 iterations, then you will find out the busiest group, then go down. The other 16 iterations .so the worst case

8 + 16 = 24 iterations. And this decrease occurs with only one level of the domain schedule. Imagine if you had more levels, you would have made the iteration number even lower.

In short, domains and planner groups are “dividing and owning,” but defeating as much as possible is a more useful “planning solution for related materials.”

I sent a message if someone in the future wants to read it.

+13
source

All Articles