Parallelizing matte strip in bash

I am trying to run a piece of code on a large computer cluster to analyze different pieces of data.

I created 2 cycles to assign tasks to different nodes and to the processor that contains the nodes. The analysis function I wrote, "chnJob ()", just needs to get the index to find out what part of the data it needs to parse (in this case, the shell variable is called "chn").

the loop is as follows:

for NODE in $NODES; do # Loop through nodes for job_idx in {1..$PROCS_PER_NODE}; do # Loop through jobs per node (8 per node) echo "this is the channel $chn" ssh $NODE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob($chn); quit'" & let chn++ sleep 2 done done 

Although I can see that the variable chn is increasing, the value of chn that is passed to the matlab function is always the last value of chn.

This is probably because Matlab takes a long time to open on each node and bash completes the loop by then. So the value that is passed to each matlab instance is only the last value.

Is there any way around this? Is it possible to "bake" the value of this variable when calling a function?

Or is the problem completely different?

+4
source share
3 answers

Bash cannot handle variables in range expressions. They must be literals: {1..10} . Because you have it now, the inner loop always executes exactly once per iteration of the outer loop instead of eight times (or regardless of the value of PROCS_PER_NODE ). As a result, chn moves from its initial value to the value plus NODES when it should go from Original_chn to NODES * PROCS_PER_NODE .

Instead, use the C-style for :

 for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++)) 

You can increase both job_idx and chn in for (if this does not give a problem separately):

 for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++, chn++)) 
+2
source

I do not think this is happening. Can you try running this:

 cnt=0 for a in 1 2; do for b in 1 2; do echo --- $cnt ssh somehost "echo result: '$cnt'" & let cnt++ done done 

Replace somehost with some host where sshd works for you. This prints the numbers 0 - 3, returned from echo result: '$cnt' , executed remotely. Thus, self-execution is working fine.

One thing I can offer is to move your command ( matlab ... ) to some script in a known folder, and then run this script in the above loops, specifying the full path to this script. Sort of:

 ssh $NOTE "/path/to/script.sh $cnt" 

In the script, $1 you will get the desired value (i.e. $cnt from the loop). You can use echo $1 >> /tmp/values at the beginning of your script to collect all the values ​​in the /tmp/values file. Of course rm /tmp/values before you start. This will confirm whether you are getting all the values ​​you want.

+3
source

If $ PBS_NODEFILE contains a file name with a list of nodes (one per line), then this should work:

  seq 1 100 | parallel --slf $PBS_NODEFILE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob({}); quit'" 

Read more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

0
source

All Articles