I am doing a parallel operation using a SOCK cluster with workers on the local machine. If I limit the set that I repeat (in one test, using 70 instead of the full 135 tasks), everything works fine. If I go to the full set, I get the error message "Error in unserialize (socklist [[n]]): error reading from connection."
I unlocked the port in the Windows Firewall (both I / O) and allowed all access for Rscript / R.
This cannot be a timeout problem, as the socket timeout is set to 365 days.
This is not a problem with any specific task, because I can work sequentially just fine (also works fine in parallel if I split the data set in half and do two separate parallel runs)
The best thing I can think of is that there is too much data being transferred on sockets. It does not seem to be possible for the cluster to throttle data constraints.
I donβt understand how to do this. Has anyone seen this problem before or can suggest a fix?
Here is the code I use to configure the cluster:
cluster = makeCluster( degreeOfParallelism , type = "SOCK" , outfile = "" ) registerDoSNOW( cluster )
Edit
Although this problem is related to the entire data set, it also appears from time to time with a reduced data set. This may mean that this is not just a data constraint issue.
Edit 2
I dug a little deeper, and it turned out that my function actually has a random component that makes the task sometimes cause an error. If I run the tasks one at a time, at the end of the operation I was told which task was not completed. If I run in parallel, I get an "unserialize" error. I tried wrapping the code that each task runs in a tryCatch call with the error = (e) {stop (e)} function, but it also generates an "unserialize" error. Am I embarrassed because I thought that the snow is coping with the errors by passing them on to the owner?
foreach parallel-processing r
SFun28
source share