Parallel Computing by Reference Classes

Question

Parallel Computing by Reference Classes

I have a list of fairly large objects to which I want to apply a complex function in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply to change them does not work.

The function modifies the object itself, so I overwrite the original object with a new one. Since the object is a list, and I only modify a small part of it, I was hoping that the semantics of R copy-on-modify would not allow multiple copies to be made; however, when you run it, it does not seem to be the way I do. Here is a small example of the basic R methods that I used. It correctly resets scales to zero.

 ## make a list of accounts, each with a balance ## and a function to reset the balance foo <- lapply(1:5, function(x) list(balance=x)) reset1 <- function(x) {x$balance <- 0; x} foo[[4]]$balance ## 4 ## BEFORE reset foo <- mclapply(foo, reset1) foo[[4]]$balance ## 0 ## AFTER reset

It seems that using reference classes can help, as they are mutable, and when using lapply it does what I expect; The reset balance is zero.

 Account <- setRefClass("Account", fields=list(balance="numeric"), methods=list(reset=function() {balance <<- 0})) foo <- lapply(1:5, function(x) Account$new(balance=x)) foo[[4]]$balance ## 4 invisible(lapply(foo, function(x) x$reset())) foo[[4]]$balance ## 0

But when I use mclapply , it does not match reset. Note that if you are on Windows or have mc.cores=1 , lapply will be called lapply .

 foo <- lapply(1:5, function(x) Account$new(balance=x)) foo[[4]]$balance ## 4 invisible(mclapply(foo, function(x) x$reset())) foo[[4]]$balance ## 4

What's happening? How can I work with base classes in parallel? Is there a better way to avoid unnecessarily copying objects?

+6

parallel-processing r reference-class

Aaron Dec 6 '13 at 18:12

source share

1 answer

Aaron · Answer 1 · 2013-12-06T20:58:39+0000

I think that branched processes, although they have access to all the variables in the workspace, cannot change them. This works, but I don't know yet if it improves memory problems or not.

 foo <- mclapply(foo, function(x) {x$reset(); x}) foo[[4]]$balance ## 0

Parallel Computing by Reference Classes

More articles: