"Error in unserialize" - foreach / doSNOW / snow with SOCK (windows)

I am doing a parallel operation using a SOCK cluster with workers on the local machine. If I limit the set that I repeat (in one test, using 70 instead of the full 135 tasks), everything works fine. If I go to the full set, I get the error message "Error in unserialize (socklist [[n]]): error reading from connection."

  • I unlocked the port in the Windows Firewall (both I / O) and allowed all access for Rscript / R.

  • This cannot be a timeout problem, as the socket timeout is set to 365 days.

  • This is not a problem with any specific task, because I can work sequentially just fine (also works fine in parallel if I split the data set in half and do two separate parallel runs)

  • The best thing I can think of is that there is too much data being transferred on sockets. It does not seem to be possible for the cluster to throttle data constraints.

I don’t understand how to do this. Has anyone seen this problem before or can suggest a fix?

Here is the code I use to configure the cluster:

cluster = makeCluster( degreeOfParallelism , type = "SOCK" , outfile = "" ) registerDoSNOW( cluster ) 

Edit
Although this problem is related to the entire data set, it also appears from time to time with a reduced data set. This may mean that this is not just a data constraint issue.

Edit 2
I dug a little deeper, and it turned out that my function actually has a random component that makes the task sometimes cause an error. If I run the tasks one at a time, at the end of the operation I was told which task was not completed. If I run in parallel, I get an "unserialize" error. I tried wrapping the code that each task runs in a tryCatch call with the error = (e) {stop (e)} function, but it also generates an "unserialize" error. Am I embarrassed because I thought that the snow is coping with the errors by passing them on to the owner?

+8
foreach parallel-processing r
source share
1 answer

I reported this problem to the SNOW author, but unfortunately there was no answer.

Edit
I have not seen this question after a while. I switched to Parallel / doParallel. In addition, I now use try () to port any code that runs in parallel. I can not reproduce the original problem.

+2
source share

All Articles