I am working on an 18 node cluster using TORQUE / PBS Pro / Open MPI.
The setup is 2 processors / node, 12 cores / CPU (so there are 24 valid processes per node).
If I submit PBS jobs that require non-uniform node breaks, for example. work requiring the 58th process, I can divide it into:
#PBS -l nodes=2:ppn=24+1:ppn=10
which assigns 2 nodes using all 24 cores, and 1 node using 10 cores. So now I should have 58 tasks.
However, when I execute qstat -a, the conclusion says that I have only 48 tasks. It does not seem to consider node / s unevenly divided.
So, do these extra 10 processes work? What's happening? Is the conclusion from the wrong qtsat?
I was looking for all the read PBS / mans that I could find, no luck.
source
share