Loop inside foreach loop using pre-parallel

I have a function containing a loop

myfun = function(z1.d, r, rs){ x = z1.d[,r] or.d = order(as.vector(x), decreasing=TRUE)[rs] zz1.d = as.vector(x) rl = zz1.d[or.d] y=vector() for (i in 1:9) { if(i<9) y[i]=mean( x[(x[,r] >= rl[i] & x[,r] < rl[i+1]),r] ) else{ y[i] = mean( z1.d[(x >= rl[9]),r] )} } return(y) } 

rs is a number vector, z1.d is a zoo, and y is also a number vector.

When I try to run a function inside a parallel loop:

 cls = makePSOCKcluster(8) registerDoParallel(cls) rlarger.d.1 = foreach(r=1:dim(z1.d)[2], .combine = "cbind") %dopar% { myfun(z1.d, r, rs)} stopCluster(cls) 

I get the following error:

 Error in { : task 1 failed - "incorrect number of dimensions" 

I do not know why, but I realized that if I exit the loop from my function, this will not give an error.

Also, if I run the same code with% do% instead of% dopar% (therefore it does not run in parallel), it works fine (slow, but without errors).

EDIT: as requested here is an example of parameters:

 dim(z1.d) [1] 8766 107 > z1.d[1:4,1:6] AU_10092 AU_10622 AU_12038 AU_12046 AU_13017 AU_14015 1966-01-01 23:00:00 NA NA NA 1.816 0 4.573 1966-01-02 23:00:00 NA NA NA 9.614 0 4.064 1966-01-03 23:00:00 0 NA NA 0.000 0 0.000 1966-01-04 23:00:00 0 NA NA 0.000 0 0.000 > rs [1] 300 250 200 150 100 75 50 30 10 

r is defined in the foreach loop

+8
foreach r doparallel
source share
2 answers

The error appears because you did not initiate zoo for your employees. Thus, workers do not know how to properly handle zoo objects; instead, they treat them as matrices that do not behave identically with a subset! Thus, a quick fix for your stated problem would be to add .packages="zoo" to your foreach call.

In my opinion, you don't even need to do parallel computing. You can significantly increase your functionality if you use numerical vectors instead of zoo objects:

 # sample time series to match your object size set.seed(1234) z.test <- as.zoo(replicate(107,sample(c(NA,runif(1000,0,10)),size = 8766, replace = TRUE))) myfun_new <- function(z, r, rs){ x <- as.numeric(z[,r]) rl <- x[order(x, decreasing=TRUE)[rs]] res_dim <- length(rs) y=numeric(res_dim) for (i in 1:res_dim){ if(i< res_dim){ y[i] <- mean( x[(x >= rl[i] & x < rl[i+1])], na.rm = TRUE ) }else{ y[i] <- mean( x[(x >= rl[res_dim])] , na.rm = TRUE) } } return(y) } 

Simple timings show an improvement:

 system.time({ cls = makePSOCKcluster(4) registerDoParallel(cls) rlarger.d.1 = foreach(r=1:dim(z.test)[2],.packages = "zoo", .combine = "cbind") %dopar% { myfun(z.test, r, rs)} stopCluster(cls) }) ## User System verstrichen ## 0.08 0.10 10.93 system.time({ res <-sapply(1:dim(z.test)[2], function(r){myfun_new(z.test, r, rs)}) }) ## User System verstrichen ## 0.48 0.21 0.68 

So far, the results are the same (different column names)

 all.equal(res, rlarger.d.1, check.attributes = FALSE) ## [1] TRUE 
+2
source share

These are sims, as there is an error in your function code.

In line 2, you create a 1-dimensional object

 x = z1.d[,r] 

In line 9, you see it as two-dimensional

 x[some_logic, r] 

This is why you have the error of the "wrong number of measurements." Although, I do not know why it works in %%%.

In any case, you need to replace the code inside the for loop with:

 if(i<9) y[i]=mean( x[(x[,r] >= rl[i] & x[,r] < rl[i+1])] ) else{ y[i] = mean( x[(x >= rl[9])] )} 

Or with:

 if(i<9) y[i]=mean( z1.d[(x[,r] >= rl[i] & x[,r] < rl[i+1]),r] ) else{ y[i] = mean( z1.d[(x >= rl[9]),r] )} 

Since you did not provide a reproducible example, I did not test it.

+1
source share

All Articles