I would like @chepner hack. And it is not so difficult to perform similar behavior with a limited number of parallel executions:
while IFS=$'\t' read -r f1 f2; do myprogram "$f1" "$f2" "${f1}_vs_${f2}.result" &
It limits the execution with the maximum number of CPU cores present in the system. You can easily change this by replacing $( nproc ) with the desired amount.
In the meantime, you should understand that this is not an honest distribution. Thus, it does not start a new thread immediately after completion. Instead, he simply waits for the completion of everything, after the start of the maximum amount. Thus, the total throughput may be slightly less than with parallel. Especially if the running time of your program can vary over a wide range. If the time spent on each call is almost the same, then the total time should also be approximately equivalent.
Hubbitus
source share