Nested foreach loops in R to update shared array

I am trying to use a couple of foreach loops in R to populate a shared array in parallel. A very simplified version of what I'm trying to do is:

library(foreach) set.seed(123) x <- matrix(NA, nrow = 8, ncol = 2) foreach(i=1:8) %dopar% { foreach(j=1:2) %do% { l <- runif(1, i, 100) x[i,j] <- i + j + l #This is much more complicated in my real code. } } 

I would like the code to update the matrix x in parallel and look like output:

 > x [,1] [,2] [1,] 31.47017 82.04221 [2,] 45.07974 92.53571 [3,] 98.22533 12.41898 [4,] 59.69813 95.67223 [5,] 63.38633 55.37840 [6,] 102.94233 56.61341 [7,] 78.01407 69.25491 [8,] 26.46907 100.78390 

However, I cannot figure out how to get the array to update. I tried to put x <- in another place, but it doesn’t seem like it. I think it will be very easy to fix, but all my searches have not brought me there yet. Thanks.

+7
foreach parallel-processing r
source share
2 answers
Loops are used for their return value.

foreach such as lapply . Thus, they are very different from for loops, which are used for their side effects. Using the corresponding .combine functions, the inner foreach can return vectors that are row-joined into a matrix by the outer foreach :

 x <- foreach(i=1:8, .combine='rbind') %dopar% { foreach(j=1:2, .combine='c') %do% { l <- runif(1, i, 100) i + j + l } } 

You can also use the nesting operator: %:% :

 x <- foreach(i=1:8, .combine='rbind') %:% foreach(j=1:2, .combine='c') %dopar% { l <- runif(1, i, 100) i + j + l } 

Note that set.seed will probably not do what you want, as it runs on the local machine, and random numbers are generated in different R sessions, possibly on different computers.

+11
source share

Just add something to Steve: I think that the decisive point is that the parallel backend starts several Rscript.exe processes (as seen from the task manager). Some objects that are used in foreach , i.e. In your case, x is then copied to the memory that was allocated for each of these processes. I'm not sure how copying is handled in the foreach package, but with the *ply functions of the *ply package plyr you need to explicitly specify the objects you want to copy. Different processes do not share their memory. (I don't know of other R packets that can use shared memory ...)

You can demonstrate that the matrix x is actually copied using .Internal(inspect(x)) to print the location of cell x .

 library(foreach) library(doParallel) x <- matrix(1:16, nrow = 8, ncol = 2) #print memory location of x capture.output(.Internal(inspect(x)))[1] #create parallel backend; in our case two Rscript.exe processes workers=makeCluster(2) registerDoParallel(workers) y<- foreach(i=1:8, .combine='rbind') %dopar% { #return memory location of x capture.output(.Internal(inspect(x)))[1] } #print matrix y #there should be two different memory locations - #according to the two Rscript.exe processes started above y #close parallel backend stopCluster(workers) 

Matrix y reads

  [,1] result.1 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..." result.2 "@0x0000000003dab9b0 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..." result.3 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..." result.4 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..." ... 

Here you should find two different memory addresses.

+2
source share

All Articles