Unexpected scaling results in Java Fork-Join (Java 8)

I recently conducted several scalability experiments using Java Fork-Join. Here I used a non-standard type ForkJoinPool constructor ForkJoinPool(int parallelism), passing the desired parallelism (# worker) as the constructor argument.

In particular, using the following code snippet:

public static void main(String[] args) throws InterruptedException {
    ForkJoinPool pool = new ForkJoinPool(Integer.parseInt(args[0]));
    pool.invoke(new ParallelLoopTask());    
}

static class ParallelLoopTask extends RecursiveAction {

    final int n = 1000;

    @Override
    protected void compute() {
        RecursiveAction[] T = new RecursiveAction[n];
        for(int p = 0; p < n; p++){
            T[p] = new DummyTask();
            T[p].fork();
        }
        for(int p = 0; p < n; p++){
            T[p].join();
        }
        /*
        //The problem does not occur when tasks are joined in the reverse order, i.e.
        for(int p = n-1; p >= 0; p--){
            T[p].join();
        }
        */
    }
}


static public class DummyTask extends RecursiveAction {
    //performs some dummy work

    final int N = 10000000;

    //avoid memory bus contention by restricting access to cache (which is distributed)
    double val = 1;

    @Override
    protected void compute() {
        for(int j = 0; j < N; j++){
            if(val < 11){
                val *= 1.1;
            }else{
                val = 1;
            }
        }
    }
}

I got these results on a processor with 4 physical and 8 logical cores (using java 8: jre1.8.0_45):

T1: 11730

T2: 2381 (acceleration: 4.93)

T4: 2463 (acceleration: 4.76)

T8: 2418 (Acceleration: 4.85)

While using java 7 (jre1.7.0), I get

T1: 11938

T2: 11843 (acceleration: 1.01)

T4: 5133 (Acceleration: 2.33)

T8: 2607 (acceleration: 4.58)

(where TP is the runtime in ms using parallelism level P)

, ( ( ), , , , ). , , .

BTW: , , , 24 parallelism 2 ...?

EDIT:

, JMH (jdk1.8.0_45) ( -bm avgt -f 1) (= 1 fork, 20 + 20 )

T1:11,664

11,664 ±(99.9%) 0,044 s/op [Average]
(min, avg, max) = (11,597, 11,664, 11,810), stdev = 0,050
CI (99.9%): [11,620, 11,708] (assumes normal distribution)

T2: 4,134 (: 2,82)

4,134 ±(99.9%) 0,787 s/op [Average]
(min, avg, max) = (3,045, 4,134, 5,376), stdev = 0,906
CI (99.9%): [3,348, 4,921] (assumes normal distribution)

T4: 2,972 (: 3,92)

2,972 ±(99.9%) 0,212 s/op [Average]
(min, avg, max) = (2,375, 2,972, 3,200), stdev = 0,245
CI (99.9%): [2,759, 3,184] (assumes normal distribution)

T8: 2,845 (: 4,10)

2,845 ±(99.9%) 0,306 s/op [Average]
(min, avg, max) = (2,277, 2,845, 3,310), stdev = 0,352
CI (99.9%): [2,540, 3,151] (assumes normal distribution)

, , .. T1 < T2 < T4 ~ T8. :

  • The difference for T2 is between java 7 and 8. Probably, one explanation would be that a worker executing a parallel loop would not be idle in java 8, but instead would find another job to execute.
  • Super linear acceleration (3x) with 2 workers. Also, note that T2 seems to increase with each iteration (see below, note that this is also the case, although to a lesser extent with P = 4.8). The time in the first iteration of the warm-up is similar to that mentioned above. Perhaps the warm-up period should be longer, but still, it is not strange that the execution time increases, i.e. Would I prefer it to shrink?
  • Finally, I still find the observation that there are many more fictitious tasks that started and did not complete than curious work topics.
<P →
Run progress: 0,00% complete, ETA 00:00:40
Fork: 1 of 1
Warmup Iteration   1: 2,365 s/op
Warmup Iteration   2: 2,341 s/op
Warmup Iteration   3: 2,393 s/op
Warmup Iteration   4: 2,323 s/op
Warmup Iteration   5: 2,925 s/op
Warmup Iteration   6: 3,040 s/op
Warmup Iteration   7: 2,304 s/op
Warmup Iteration   8: 2,347 s/op
Warmup Iteration   9: 2,939 s/op
Warmup Iteration  10: 3,083 s/op
Warmup Iteration  11: 3,004 s/op
Warmup Iteration  12: 2,327 s/op
Warmup Iteration  13: 3,083 s/op
Warmup Iteration  14: 3,229 s/op
Warmup Iteration  15: 3,076 s/op
Warmup Iteration  16: 2,325 s/op
Warmup Iteration  17: 2,993 s/op
Warmup Iteration  18: 3,112 s/op
Warmup Iteration  19: 3,074 s/op
Warmup Iteration  20: 2,354 s/op
Iteration   1: 3,045 s/op
Iteration   2: 3,094 s/op
Iteration   3: 3,113 s/op
Iteration   4: 3,057 s/op
Iteration   5: 3,050 s/op
Iteration   6: 3,106 s/op
Iteration   7: 3,080 s/op
Iteration   8: 3,370 s/op
Iteration   9: 4,482 s/op
Iteration  10: 4,325 s/op
Iteration  11: 5,002 s/op
Iteration  12: 4,980 s/op
Iteration  13: 5,121 s/op
Iteration  14: 4,310 s/op
Iteration  15: 5,146 s/op
Iteration  16: 5,376 s/op
Iteration  17: 4,810 s/op
Iteration  18: 4,320 s/op
Iteration  19: 5,249 s/op
Iteration  20: 4,654 s/op
+4
source share
1

, , . , . . SO . , jmh Java9, .

EDIT:

, - , . , . .

. 2010 . , join . , .

(n = 100000000) (). VisualVM . , ..

, , . / - .

-2

All Articles