How to use GNU make-max-load on a multi-core Linux machine?

From the documentation for GNU make: http://www.gnu.org/software/make/manual/make.html#Parallel

When the system is heavily loaded, you probably want to run fewer tasks than with a small load. You can use the -l option to tell make to limit the number of jobs to run right away, depending on the average load. The -l or '--max -l oad option is followed by a floating point number. For example,

-l 2.5 

will not allow you to start more than one task if the average load exceeds 2.5. The [-l option without the following number removes the load limit if it was specified with the previous -l option.

More precisely, when make starts the work, and it already has at least one task, it checks the current load; if it is not lower than the limit specified in -l, wait until the average load reaches below this limit or until all other tasks are complete.

From the Linux man page for uptime: http://www.unix.com/man-page/Linux/1/uptime/

System load averages are the average number of processes that are either in a runnable state or in a state of uninterrupted power. A process in runnable state either uses the CPU or expects the CPU to be used. A process in an uninterrupted power state expects some I / O access, for example, to wait for a disk. Average values ​​are taken over three time intervals. The average load values ​​are not normalized for the number of processors in the system, so the average load value of 1 means that one CPU system is constantly loading, and in a system with 4 processors it means that it was inactive in 75% of cases.

I have a parallel make file, and I want to do the obvious thing: do it to keep adding processes until I get full CPU usage, but I don't cause cracks.

Many (all?) Machines today are multi-core, so this means that the average load value is not the number that should be checked, since this number needs to be adjusted for the number of cores.

Does this mean that --max-load (aka -l ) for GNU make is now useless? What do people who execute parallel makefiles on multicore machines do?

+9
source share
4 answers

My short answer is: --max-load is useful if you are willing to invest the time needed to use it effectively. With its current implementation, there is no simple formula for choosing good values ​​or a preliminary tool to detect them.


The line that I support is quite large. Before I started supporting it, the assembly was 6 hours. With -j64 on ramdisk, now it ends in 5 minutes (30 on an NFS mount with -j12 ). My goal was to find reasonable restrictions for -j and -l , which allows our developers to build quickly, but does not make the server (build server or NFS server) unsuitable for everyone else.

To start:

  • If you choose a reasonable -jN value (on your computer) and find a reasonable upper limit for the average load (on your computer), they work great together to balance things.
  • If you use a very large value of -jN (or unspecified, for example, -j without a number) and limit the average load, gmake will:
    • continue spawning processes (gmake 3.81 added a throttling mechanism, but this only slightly alleviates the problem) until the maximum number of jobs is reached or until the average load value exceeds your threshold
    • while the average load exceeds your threshold:
      • do nothing until all subprocesses are complete
      • run one task at a time
    • do it all over again

On Linux, at least (and possibly in other * nix variants), the average load level is the exponential moving average (UNIX Load Average Reweighed, Neil J. Gunther), which is the average number of processes waiting for processor time (can be caused by too many processes, waiting for I / O, page errors, etc.). Since this is an exponential moving average, it is weighed so that newer samples have a stronger effect on the current value than older samples.

If you can identify a good sweet spot for the right maximum load and the number of parallel tasks (using a combination of educated guesswork and empirical testing), if you have a long job: your 1 min. (will not fluctuate much). However, if your -jN number -jN too high for the given maximum load value, it will change slightly.

Finding that the sweet spot is essentially equivalent to finding the optimal parameters for the differential equation. Since it will be subjected to initial conditions, the main focus will be on finding parameters that will cause the system to remain in equilibrium, and not when the average value of the β€œtarget” load is reached. By "in equilibrium" I mean: 1m load avg does not fluctuate much.

Assuming you're not limited by gmake restrictions: When you find the -jN -lM combination that gives the minimum build time: this combination will push your machine to its limits. If the device must be used for other purposes ...

compiling

... you can reduce it a bit when you finish the optimization.

Excluding avg loading, the improvements that I saw during the build with the -jN increase -jN out to be [roughly] logarithmic. That is, I saw a greater difference between -j8 and -j12 than between -j12 and -j16 .

Things peaked for me somewhere between -j48 and -j64 (on Solaris, this was around -j56 ), because the initial gmake process is single-threaded; at some point, a thread cannot start new tasks faster than they end.

My tests were conducted on:

  • Non-recursive assembly
    • recursive assemblies can see different results; they won't face the bottleneck i made around -j64
    • I did my best to minimize the number of make-isms (extension variables, macros, etc.) in recipes, because recipes are parsed in the same thread that spawns parallel jobs. The more complicated the recipes, the more time he spends in the parser instead of creating or collecting tasks. For example:
      • Macros $(shell ...) are used in recipes; they run during the first parsing session and are cached.
      • Most variables are assigned with := to avoid recursive expansion
  • Solaris 10 / sparc
    • 256 cores
    • no virtualization / logical domains
    • assembly works on ramdisk
  • x86_64 linux
    • 32-core (4x hyper-threading)
    • no virtualization
    • assembly was performed on a fast local disk
+6
source

Many (all?) Machines today are multi-core, so this means that the load average value is not the number that should be checked, since this number needs to be adjusted for the number of cores.

Does this mean that the -max-load (aka -l) flag for GNU make is now useless?

Not. Imagine jobs with demanding disk I / O. If you started as many jobs as you had processors, you still won’t use the processor very well.

Personally, I just use -j, because so far it has worked well enough for me.

0
source

Even for builds where the CPU is the bottleneck, -l not ideal. I use -jN , where N is the number of kernels that exist or that I want to spend on the assembly. Choosing a larger number does not speed up the build in my situation. This also does not slow it down if you do not go overboard (for example, indicating infinity through -j ).

Using -lN is generally equivalent to -jN and might work better if the machine has other independent work, but there are two quirks (besides what you mentioned, the number of cores is not taken into account)

  • Initial surge: when assembly begins, make starts many jobs, much larger than N. The number of system loads does not immediately increase when the process forks. This is not a problem in my situation.
  • Starvation: when some assembly tasks take a lot of time compared to others, at the time of completion of the first fast tasks M, the system load still remains> N. Soon, the system load drops to N - M, but so far these few slow tasks are being dragged, new jobs do not start, but the kernels remain hungry. Just think about starting new jobs, when old work ends, and at the beginning. He does not notice a decrease in system load.
0
source

Does this mean that the -max -l oad (aka -l) flag is now useless for GNU make? What do people who execute parallel makefiles on multicore machines do?

One example is the execution of tasks in a test suite, where each test must compile and link the program. Binding to the load is sometimes too strong, resulting in a fatal error: ld terminated with signal 9 [Killed]. In my case, these were not memory overheads, but CPU usage, so the usually suggested page file did not help.

With the -l 1 option, execution is still parallel, but the binding is almost consistent: system monitor visualizes resources consumption

0
source

All Articles