Is it possible to extend STDIN to parallel processes?

Question

Is it possible to extend STDIN to parallel processes?

Given the following example input STDIN:

foo bar bar baz === qux bla === def zzz yyy

Is it possible to split it on a separator (in this case '===') and pass it via stdin to a parallel command?

Thus, the above example will lead to 3 parallel processes (for example, a command called do.sh), where each instance received a piece of data on STDIN, for example:

do.sh (instance 1) gets this via STDIN:

 foo bar bar baz

do.sh (instance 2) gets this via STDIN:

 qux bla

do.sh (instance 3) gets this via STDIN:

 def zzz yyy

I suppose something like this is possible using xargs or GNU parallel, but I don't know how to do this.

+6

bash parallel-processing process stdin xargs

Erik Jan 11 '11 at 11:07

source share

3 answers

In general, no. One of the reasons for this assessment is that the standard I / O reading from files, not the terminal, reads data blocks - BUFSIZ bytes at a time, where BUFSIZ usually has a capacity of 2, for example 512 or more. If the data is in a file, one process will read the entire specified file - the rest would not see anything if they shared the same open file description (similar to a file descriptor, but several file descriptors can share the same open file description and may be in different processes), or it will read the entire same file if they do not use the same open file description.

So, you need a process to read a file that knows that it needs to transfer information to three processes - and it needs to know how to connect to the three processes. Perhaps your distributor program starts three processes and writes them to separate inputs. Or it may happen that a distributor connects to three sockets and writes them to different sockets.

Your example does not display / describe what happens if it divides 37 sections.

I have a brew home program called tpipe , which is similar to the Unix tee command, but it writes a copy (all) of its standard input for each of the processes and to the standard outputs also by default. This may be the right foundation for what you need (it at least covers part of the process control). Contact me if you want a copy - see my profile.

If you use Bash, you can use regular tee with a process replacement to simulate tpipe . See the article for an illustration of how.

See also SF 96245 for another version of the same information - plus a link to a program called pee , which is very similar to tpipe (same basic idea, slightly different implementation in different respects).

+2

Jonathan leffler Jan 11 '11 at 14:37

source share

You can do this using named pipes . Named pipes allow you to process standard pipes as files. You can have multiple named pipes, and your other programs process them.

I am not so familiar with named pipes, but I used them from time to time in such situations.

+1

David W. Jan 11 '11 at 16:34

source share

Ole tange · Accepted Answer · 2011-01-11T14:26:39+0000

GNU Parallel can do this with version 20110205.

 cat | parallel --pipe --recend '===\n' --rrs do_stuff

Is it possible to extend STDIN to parallel processes?

More articles: