Using make to do independent tasks in parallel

I have a bunch of commands that I would like to execute in parallel. The teams are almost identical. They can be expected at about the same time, and they can work completely independently. They may look like this:

command -n 1 > log.1 command -n 2 > log.2 command -n 3 > log.3 ... command -n 4096 > log.4096 

I could run them all in parallel in a shell script, but the system would try to load more than would be strictly necessary to keep the CPU busy (each task takes 100% of one core until it is complete). This can cause the disk to crash and make everything slower than the less greedy approach to execution.

The best approach is probably to complete n tasks, where n is the number of available cores.

I try not to reinvent the wheel. This problem has already been resolved in the Unix make program (when used with the -jn option). I was wondering if it is possible to write general Makefile rules for the above to avoid creating a linear sized Makefile that would look like this:

 all: log.1 log.2 ... log.1: command -n 1 > log.1 log.2: command -n 2 > log.2 ... 

If the best solution is not using make , but another program / utility, I am open to this if the dependencies are reasonable ( make was very good in this regard).

+4
source share
5 answers

See template rules

In another way, if this is the only reason you need to make , use the -n and -P xargs options.

+3
source

Here is a more portable shell code that does not depend on extension extensions:

LOGS: = $ (shell seq 1 1024)

Note the use of: = to define a more effective variable: just an extended "flavor".

+4
source

The easy part first. As Roman Chaplyak points out, template rules are very useful:

 LOGS = log.1 log.2 ... log.4096 all: $(LOGS) log.%: command -n $* > log.$* 

The tricky part is creating this LOGS list. Make doesn't do numbers very well. The best way is probably to invoke the shell. (You may need to adjust this script for shell-shell scripts; this is not the strongest question.)

 NUM_LOGS = 4096 LOGS = $(shell for ((i=1 ; i<=$(NUM_LOGS) ; ++i)) ; do echo log.$$i ; done) 
+3
source

xargs -P is the "standard" way to do this. Note that depending on disk I / O, you can restrict spindles, not cores. If you want to restrict kernels, look at the new nproc command in recent kernels.

+3
source

With GNU Parallel, you should write:

 parallel command -n {} ">" log.{} ::: {1..4096} 

10 second installation:

 (wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash 

More details: http://www.gnu.org/software/parallel/parallel_tutorial.html https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

+2
source

All Articles