For communication between processes, an interesting place to start is the ?socketConnections man page and code in a piece labeled "## Not run:". So, run process R and run
con1 <- socketConnection(port = 6011, server=TRUE)
This process acts as a server, listening on a specific port for some information. Now run the second process R and type
con2 <- socketConnection(Sys.info()["nodename"], port = 6011)
con2 in process 2 made a socket connection with con1 in process 1. Go back to con1, write out the object R LETTERS
writeLines(LETTERS, con1)
and get them on con2.
readLines(con2)
So, you talked between processes without writing to disk. It also implies some important concepts, such as blocking and non-blocking connections. This is not limited to communication on one computer if the ports are accessible on any network on which the computers are located. This is the basis for makePSOCKcluster in a parallel package with the addition that process 1 actually uses the system command and script in a parallel package to start process 2. The object returned by makePSOCKcluster so that you can allocate part of your cluster for a specific task. Basically, you could arrange for spawned nodes to communicate with each other independently of the node that spawned.
An interesting exercise is to do the same with the fork-like commands in the parallel package (on non-Windows). The version of this level is on the ?mcparallel help page, for example,
p <- mcparallel(1:10) q <- mcparallel(1:20) # wait for both jobs to finish and collect all results res <- mccollect(list(p, q))
but this is based on the top level of sendMaster and friends (the peak in the source code is mcparallel and mccollect ).
The Rmpi package uses an approach such as the PSOCK example, where the dispatcher uses scripts to create jobs and communicate using mpi rather than sockets. But another approach worthy of a weekend project, if you have a valid MPI implementation, is to implement a script that does the same calculation for different data, and then matches the results to one node using commands like mpi.comm.rank , mpi.barrier , mpi.send.Robj and mpi.recv.Robj .
Over the weekend, the project will use a parallel package to implement a workflow that includes parallel computing, but not from a variety of mclapply, for example, when one process collects data from a website and then passes it to another process that draws beautiful pictures, The input for the first process may be JSON, but the connection inside R is probably much more suitable R.