Performing bubble sort 5 times slower with --indy

I wrote a bubble sort implementation to play --indy with Groovy a bit and see if the --indy effect --indy noticeable effect on performance.

Essentially, it sorts a list of thousands of random numbers a thousand times and measures the average runtime to sort the list.

Half the time when the list is Integer[] and the other half is ArrayList<Integer> .

The results really baffle me:

 $ groovyc BubbleSort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. BubbleSort Average: 22.926ms Min: 11.202ms [...] 26.48s user 0.84s system 109% cpu 25.033 total $ groovyc --indy BubbleSort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. BubbleSort Average: 119.766ms Min: 68.251ms [...] 166.05s user 1.52s system 135% cpu 2:03.82 total 

Looking at CPU usage when doing benchmarks, CPU usage is much higher when compiling with --indy than without.

Cpu usage

This intrigued me, so I ran the tests again, but this time the Tukit agent and the processor trace were activated. Here are the recorded call trees:

Without --indy : Call a tree without <code> - indy </code>

With --indy : Call Tree with <code> - indy </code>

And here are the performance diagrams - note that the time scale is different from the fact that --indy code --indy much slower.

Without --indy (1s scale): Performance without <code> - indy </code>

C --indy (60 s scale): Performance with <code> - indy </code>

As you can see, the CPU usage is stabilized at 100% from one core (12.5% ​​on the graph) when compiling without --indy , but when compiling with --indy changes from 12.5% ​​to 35%. Even more confusing is that Yourkit reports only one live thread (and my code uses only the main thread), but it still manages to hold two and a half kernels.

Code compiled with --indy also uses a lot of kernel time at the beginning, although after a while it crashes and stabilizes by 0% - at this point the code seems to be slightly faster (heap usage growth rate increases), and CPU usage increases .

Can someone explain this behavior to me?

Versions:

 $ groovy -v Groovy Version: 2.4.3 JVM: 1.8.0_45 Vendor: Oracle Corporation OS: Linux $ java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) 

BubbleSort.groovy:

 class BubbleSort { final def array BubbleSort(final def array) { this.array = array } private void swap(int a, int b) { def tmp = array[a]; array[a] = array[b] array[b] = tmp; } private void rise(int index) { for(int i = index; i > 0; i--) { if(array[i] < array[i - 1]) { swap(i, i-1) } else { break } } } void sort() { for(int i = 1; i < array.size(); i++) { rise i } } final static Random random = new Random() static void main(String[] args) { def n = 1000 def size = 1000 // Warm up doBenchmark 100, size def results = doBenchmark n, size printf("Average: %.3fms%n", results.total / 1e6 / n) printf("Min: %.3fms%n", results.min / 1e6) } private static def doBenchmark(int n, int size) { long total = 0 long min = Long.MAX_VALUE n.times { def array = (1..size).collect { random.nextInt() } if(it % 2) { array = array as Integer[] } def start = System.nanoTime() new BubbleSort<Integer>(array).sort() def end = System.nanoTime() def time = end - start total += time min = Math.min min, time } return [total: total, min: min] } } 

I am not interested in optimizing my implementation of bubble sorting if they are not related to invokedynamic behavior - the goal here is not to write the best possible bubble sorting, but to understand why --indy has such a big negative impact on performance .

Update:

I converted my code to JRuby and tried the same, and the results are similar, although JRuby is not so fast, but invokedynamic :

 $ JAVA_OPTS="-Djruby.compile.invokedynamic=false" jruby bubblesort.rb Average: 78.714ms Min: 35.000ms $ JAVA_OPTS="-Djruby.compile.invokedynamic=true" jruby bubblesort.rb Average: 136.287ms Min: 92.000ms 

Update 2:

If I remove the code that changes the list to Integer[] , half the time will increase significantly, although it is still faster without --indy :

 $ groovyc BubbleSort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. BubbleSort Average: 29.794ms Min: 26.577ms $ groovyc --indy BubbleSort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. BubbleSort Average: 37.506ms Min: 33.555ms 

If I do the same with JRuby, invokedynamic is faster:

 $ JAVA_OPTS="-Djruby.compile.invokedynamic=false" jruby bubblesort.rb Average: 34.388ms Min: 31.000ms $ JAVA_OPTS="-Djruby.compile.invokedynamic=true" jruby bubblesort.rb Average: 20.785ms Min: 18.000ms 
+5
source share
1 answer

The answer is actually pretty simple, Groovy doesn't have a pic yet.

... or you can say that we usually have a built-in cache of size 1. This means that every time you change the type of the list array, it will invalidate all the caches that existed before and the cached version will be deleted. That is, for ordinary Groovy it is almost the same as for indy, only that ordinary Groovy uses the generated classes of the runtime, and indy uses invokedynamic / lambda forms. But lambda forms are initially slower, and maximum performance is better. Basically, what you do allows you to start a hotspot from scratch for most method calls, without allowing it to constantly apply optimization. Of course, this is not your fault, but Groovy's mistake for the lack of PIC. and just to make it very clear ... this is not a language problem, it is just something that I have not yet been able to implement.

JRuby, on the other hand, has a PIC and therefore does not have to suffer from the overhead of creating new processing methods all the time.

+7
source

All Articles