GPU Resource Planning Using the Sun Grid Engine (SGE)

We have a cluster of machines, each of which has 4 GPUs. Each job should be able to request 1-4 GPUs. Here's the catch: I would like SGE to tell every job it has to do to the GPU (s). Unlike a processor, a GPU works best if only one process accesses it at a time. Therefore, I would like to:

Job #1 GPU: 0, 1, 3 Job #2 GPU: 2 Job #4 wait until 1-4 GPUs are avaliable 

The problem I am facing is that SGE will allow me to create a GPU resource with 4 units on each node, but it will not explicitly indicate a job that uses the GPU (only that it gets 1, or 3, or what something else).

I was thinking of creating 4 resources ( gpu0, gpu1, gpu2, gpu3 ), but not sure if the -l flag will accept the glob pattern and not be able to figure out how SGE will indicate the job it received. Any ideas?

+4
source share
2 answers

If you have several GPUs and you want the GPU to request your tasks, but the Grid Engine scheduler needs to process and select free GPUs, you can configure the RSMAP complex (resource map) (instead of INT). This allows you to specify the number as well as the names of GPUs on a specific host in the host configuration. You can also configure it as a HOST consumable, so that regardless of the slots of your request, the number of GPU devices requested with -l cuda = 2 for each host 2 (even if the received parallel job is 8 slots on different hosts).

 qconf -mc #name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------------- gpu gpu RSMAP <= YES HOST 0 0 

In the runtime host configuration, you can initialize your resources with identifiers / names (here simply GPU1 and GPU2).

 qconf -me yourhost hostname yourhost load_scaling NONE complex_values gpu=2(GPU1 GPU2) 

Then, when querying -l gpu = 1, the Univa Grid Engine scheduler will select GPU2 if GPU1 is already in use by another task. You can see the actual selection in qstat -j. The task receives the selected GPU by reading the environment variable $ SGE_HGR_gpu, which contains the selected id / name "GPU2" in this case. This can be used to access the right GPU without collisions.

If you have a multihomed host, you can even connect the GPU directly to some processor cores near the GPU (near the PCIe bus) to speed up the data exchange between the GPU and the CPU. This is possible by adding a layout topology to the execution host configuration.

 qconf -me yourhost hostname yourhost load_scaling NONE complex_values gpu=2(GPU1:SCCCCScccc GPU2:SccccSCCCC) 

Now that the UGE scheduler selects GPU2, it automatically binds the task to all 4 cores (C) of the second socket (S), so the task is not allowed to run on the first socket. It does not even require qsub param binding.

Additional configuration examples can be found at www.gridengine.eu .

Please note that all these functions are available only in the Univa Grid Engine (8.1.0 / 8.1.3 and higher), and not in SGE 6.2u5 and another version of the Grid Engine (for example, OGE, Sun Grid Engine, etc.) You can try downloading the 48-core free version from univa.com.

+4
source

If you use one of the options for another grid option, you can try to adapt the scripts that we use in our cluster: https://github.com/UCL/Grid-Engine-Prolog-Scripts

+1
source

Source: https://habr.com/ru/post/1412072/


All Articles