What does -ntasks or -n tasks do in SLURM?

I used SLURM to use some computing cluster, and it had -ntasks or -n . I obviously read the documentation for it ( http://slurm.schedmd.com/sbatch.html ):

sbatch does not start tasks, it requests resource allocation and sends a script package. This parameter advises the Slurm controller that the work steps performed as part of the distribution will run a maximum number of tasks and provide sufficient resources. By default, one task per node is used, but note that the -cpus-for-task parameter will change this default value.

specific part that i don't understand what this means:

performed as part of the distribution, the maximum number of tasks is launched and sufficient resources are provided.

I have a few questions:

  • I assume that my first question is what the word β€œtask” means, and the difference is with the word β€œwork” in the context of SLURM. I usually see work as a bash script under sbatch, as in sbatch my_batch_job.sh . Not sure what the task means.
  • If I equate the word task with the task, I thought that it would work with the same bash script several times according to the argument -n, --ntasks=<number> . However, I obviously checked it in the cluster, ran echo hello with --ntask=9 , and I expected sbatch to echo hi 9 times for stdout (which is compiled in slurm-job_id.out , but to my surprisingly, there was one execution of my echo hi script Then what does this command do? It seems to be doing nothing or at least I can’t figure out what it should do.

I know that the -a, --array=<indexes> option -a, --array=<indexes> exists for several jobs. This is a different topic. I just want to know what to do --ntasks , ideally, with an example so that I can test it in a cluster.

+10
bash slurm
source share
2 answers

The "--ntasks" options determine the number of instances of your command. For general cluster setup and when you run your command using "srun" this corresponds to the number of MPI ranks.

Unlike the -cpus-per-task option, specify how much CPU each task can use.

Your result surprises me. Did you run your command in a script or via srun? You see the script:

 #!/bin/bash #SBATCH --ntasks=8 ## more options echo hello 

This should always output only one line, because the script is only run to send the node not working.

If your script looks like

 #!/bin/bash #SBATCH --ntasks=8 ## more options srun echo hello 

srun forces the script to run your command on the work nodes, and as a result, you should get 8 welcome lines.

+13
source share

The --ntasks is useful if you have commands that you want to execute in parallel in one batch script. These can be two separate commands, separated by & or two commands used in the bash ( | ) channel.

for example

Using the default ntasks = 1

 #!/bin/bash #SBATCH --ntasks=1 srun sleep 10 & srun sleep 12 & wait 

Throw a warning

Job step creation is temporarily disabled, retrying

The default number of tasks was set to one, and so the second task cannot be started until the first task is completed. This work will end in about 22 seconds. To break this:

 sacct -j515058 --format=JobID,Start,End,Elapsed,NCPUS JobID Start End Elapsed NCPUS ------------ ------------------- ------------------- ---------- ---------- 515058 2018-12-13T20:51:44 2018-12-13T20:52:06 00:00:22 1 515058.batch 2018-12-13T20:51:44 2018-12-13T20:52:06 00:00:22 1 515058.0 2018-12-13T20:51:44 2018-12-13T20:51:56 00:00:12 1 515058.1 2018-12-13T20:51:56 2018-12-13T20:52:06 00:00:10 1 

Here task 0 started and completed (after 12 seconds), and then task 1 (after 10 seconds). So that the total user time is 22 seconds.

To run both of these commands at the same time:

 #!/bin/bash #SBATCH --ntasks=2 srun --ntasks=1 sleep 10 & srun --ntasks=1 sleep 12 & wait 

Execution of the same sacred command as above

  sacct -j 515064 --format=JobID,Start,End,Elapsed,NCPUS JobID Start End Elapsed NCPUS ------------ ------------------- ------------------- ---------- ---------- 515064 2018-12-13T21:34:08 2018-12-13T21:34:20 00:00:12 2 515064.batch 2018-12-13T21:34:08 2018-12-13T21:34:20 00:00:12 2 515064.0 2018-12-13T21:34:08 2018-12-13T21:34:20 00:00:12 1 515064.1 2018-12-13T21:34:08 2018-12-13T21:34:18 00:00:10 1 

Here, the overall work took 12 seconds. There is no risk that tasks are waiting for resources, since the number of tasks is indicated in the batch script, and therefore the task has resources for simultaneously executing such a number of commands.

Each task inherits the parameters specified for the batch script. That is why --ntasks=1 must be specified for each srun task, otherwise each task uses --ntasks=2 so the second command will not be executed until the first task is completed.

Another caveat for tasks that inherit package options if --export=NONE specified as the package parameter. In this case, --export=ALL must be specified for each srun command; otherwise, the environment variables set in the sbatch script are not inherited by the srun command.

Additional notes:
When using bash channels, you may need to specify --nodes = 1 to prohibit commands on both sides of channels operating on separate nodes.
When using & to run commands at the same time, wait is vital. In this case, without the wait command, task 0 will be canceled if task 1 is successful.

0
source share

All Articles