Why is StringBuilder # append (int) faster in Java 7 than in Java 8?

When exploring the little wrt debate using "" + n and Integer.toString(int) to convert an integer primitive to a string, I wrote this JMH microbenchmark:

 @Fork(1) @OutputTimeUnit(TimeUnit.MILLISECONDS) @State(Scope.Benchmark) public class IntStr { protected int counter; @GenerateMicroBenchmark public String integerToString() { return Integer.toString(this.counter++); } @GenerateMicroBenchmark public String stringBuilder0() { return new StringBuilder().append(this.counter++).toString(); } @GenerateMicroBenchmark public String stringBuilder1() { return new StringBuilder().append("").append(this.counter++).toString(); } @GenerateMicroBenchmark public String stringBuilder2() { return new StringBuilder().append("").append(Integer.toString(this.counter++)).toString(); } @GenerateMicroBenchmark public String stringFormat() { return String.format("%d", this.counter++); } @Setup(Level.Iteration) public void prepareIteration() { this.counter = 0; } } 

I ran it with default JMH settings with Java virtual machines that exist on my Linux machine (modern Mageia 4 64-bit Intel i7-3770 processor, 32 GB of RAM). The first JVM was the one that came with the Oracle JDK 8u5 64-bit:

 java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) 

With this JVM, I got almost what I expected:

 Benchmark Mode Samples Mean Mean error Units b.IntStr.integerToString thrpt 20 32317.048 698.703 ops/ms b.IntStr.stringBuilder0 thrpt 20 28129.499 421.520 ops/ms b.IntStr.stringBuilder1 thrpt 20 28106.692 1117.958 ops/ms b.IntStr.stringBuilder2 thrpt 20 20066.939 1052.937 ops/ms b.IntStr.stringFormat thrpt 20 2346.452 37.422 ops/ms 

those. using the StringBuilder class is slower due to the additional overhead of creating a StringBuilder and adding an empty string. Using String.format(String, ...) even slower, an order or so.

The compiler provided by the distribution, on the other hand, is based on OpenJDK 1.7:

 java version "1.7.0_55" OpenJDK Runtime Environment (mageia-2.4.7.1.mga4-x86_64 u55-b13) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) 

The results here were interesting:

 Benchmark Mode Samples Mean Mean error Units b.IntStr.integerToString thrpt 20 31249.306 881.125 ops/ms b.IntStr.stringBuilder0 thrpt 20 39486.857 663.766 ops/ms b.IntStr.stringBuilder1 thrpt 20 41072.058 484.353 ops/ms b.IntStr.stringBuilder2 thrpt 20 20513.913 466.130 ops/ms b.IntStr.stringFormat thrpt 20 2068.471 44.964 ops/ms 

Why StringBuilder.append(int) appear much faster with this JVM? Looking at the source code of the StringBuilder class, I did not find anything particularly interesting - this method is almost identical to Integer#toString(int) . It is interesting to note that adding the result Integer.toString(int) (microobject stringBuilder2 ) does not seem to be faster.

Is this performance mismatch a problem with the test harness? Or does my OpenJDK JVM contain optimizations that will affect this particular (anti) pattern code?

EDIT:

For a more direct comparison, I installed Oracle JDK 1.7u55:

 java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) 

The results are similar to OpenJDK:

 Benchmark Mode Samples Mean Mean error Units b.IntStr.integerToString thrpt 20 32502.493 501.928 ops/ms b.IntStr.stringBuilder0 thrpt 20 39592.174 428.967 ops/ms b.IntStr.stringBuilder1 thrpt 20 40978.633 544.236 ops/ms 

This seems to be a more general Java 7 vs Java 8. issue. Perhaps Java 7 had more aggressive string optimization?

EDIT 2 :

For completeness, here are the string-related VM parameters for both of these JVMs:

For Oracle JDK 8u5:

 $ /usr/java/default/bin/java -XX:+PrintFlagsFinal 2>/dev/null | grep String bool OptimizeStringConcat = true {C2 product} intx PerfMaxStringConstLength = 1024 {product} bool PrintStringTableStatistics = false {product} uintx StringTableSize = 60013 {product} 

For OpenJDK 1.7:

 $ java -XX:+PrintFlagsFinal 2>/dev/null | grep String bool OptimizeStringConcat = true {C2 product} intx PerfMaxStringConstLength = 1024 {product} bool PrintStringTableStatistics = false {product} uintx StringTableSize = 60013 {product} bool UseStringCache = false {product} 

The UseStringCache option UseStringCache been removed in Java 8 without replacement, so I doubt it matters. Other parameters have the same settings.

EDIT 3:

A comparative comparison of the source code for the AbstractStringBuilder , StringBuilder and Integer classes from the src.zip file shows nothing remarkable. Besides a lot of changes in cosmetics and documentation, Integer now has some support for unsigned integers, and StringBuilder been slightly reorganized to share more code with StringBuffer . None of these changes affect the code paths used by StringBuilder#append(int) , although I may have missed something.

Comparing the assembly code generated for IntStr#integerToString() and IntStr#stringBuilder0() is much more interesting. The basic layout of the code generated for IntStr#integerToString() was the same for both JVMs, although the Oracle JDK 8u5 seemed more aggressive wrt by inserting some calls into Integer#toString(int) code. There was a clear correspondence with the Java source code, even for those with minimal build experience.

The build code for IntStr#stringBuilder0() , however, was radically different. The code generated by Oracle JDK 8u5 was again directly linked to the Java source code - I could easily recognize the same layout. On the contrary, the code created by OpenJDK 7 was almost unrecognizable for the unprepared eye (for example, for me). The call to new StringBuilder() would seem to be removed, as was the creation of the array in the StringBuilder constructor. In addition, the disassembler plugin was unable to provide as many references to the source code as in JDK 8.

I assume that this is either the result of a more aggressive optimization transition in OpenJDK 7, or rather, the result of entering handwritten low-level code for certain StringBuilder operations. I am not sure why this optimization is not performed in my JVM 8 implementation or why the same optimizations for Integer#toString(int) were not implemented in JVM 7. Probably someone who is familiar with the relevant parts of the JRE source code will have to answer these questions ...

+76
java performance java-7 java-8 jmh
May 20 '14 at 10:13
source share
2 answers

TL; DR: Side effects in append seem to violate StringConcat optimization.

Very good analysis in the original question and updates!

For completeness, here are a few missing steps:

  • -XX:+PrintInlining for 7u55 and 8u5. In 7u55 you will see something like this:

      @ 16 org.sample.IntStr::inlineSideEffect (25 bytes) force inline by CompilerOracle @ 4 java.lang.StringBuilder::<init> (7 bytes) inline (hot) @ 18 java.lang.StringBuilder::append (8 bytes) already compiled into a big method @ 21 java.lang.StringBuilder::toString (17 bytes) inline (hot) 

    ... and in 8u5:

      @ 16 org.sample.IntStr::inlineSideEffect (25 bytes) force inline by CompilerOracle @ 4 java.lang.StringBuilder::<init> (7 bytes) inline (hot) @ 3 java.lang.AbstractStringBuilder::<init> (12 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 18 java.lang.StringBuilder::append (8 bytes) inline (hot) @ 2 java.lang.AbstractStringBuilder::append (62 bytes) already compiled into a big method @ 21 java.lang.StringBuilder::toString (17 bytes) inline (hot) @ 13 java.lang.String::<init> (62 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 55 java.util.Arrays::copyOfRange (63 bytes) inline (hot) @ 54 java.lang.Math::min (11 bytes) (intrinsic) @ 57 java.lang.System::arraycopy (0 bytes) (intrinsic) 

    You may have noticed that version 7u55 is smaller and it seems that nothing is called after StringBuilder methods - this is a good indication that string optimization is StringBuilder . Indeed, if you run 7u55 with -XX:-OptimizeStringConcat , the -XX:-OptimizeStringConcat will reappear and performance will drop to 8u5.

  • OK, so we need to find out why 8u5 does not do the same optimization. Grep http://hg.openjdk.java.net/jdk9/jdk9/hotspot for "StringBuilder" to find out where the VM handles StringConcat optimization; this will bring you to src/share/vm/opto/stringopts.cpp

  • hg log src/share/vm/opto/stringopts.cpp to find out the latest changes there. One of the candidates:

     changeset: 5493:90abdd727e64 user: iveresov date: Wed Oct 16 11:13:15 2013 -0700 summary: 8009303: Tiered: incorrect results in VM tests stringconcat... 
  • Look at the review topics on the OpenJDK mailing lists (easy enough for Google to summarize the changes): http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012084.html

  • Spot "Optimization of string optimization String collapses the pattern [...] into one line selection and generates the result directly. All possible errors that may occur in the optimized code will restart this pattern from the very beginning (starting from the selection of StringBuffer). This means that the whole template should have a side effect. "Eureka?

  • Write a comparative test:

     @Fork(5) @Warmup(iterations = 5) @Measurement(iterations = 5) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IntStr { private int counter; @GenerateMicroBenchmark public String inlineSideEffect() { return new StringBuilder().append(counter++).toString(); } @GenerateMicroBenchmark public String spliceSideEffect() { int cnt = counter++; return new StringBuilder().append(cnt).toString(); } } 
  • Measure it on the JDK 7u55, seeing the same performance for inline / spliced ​​side effects:

     Benchmark Mode Samples Mean Mean error Units osIntStr.inlineSideEffect avgt 25 65.460 1.747 ns/op osIntStr.spliceSideEffect avgt 25 64.414 1.323 ns/op 
  • Measure it on the JDK 8u5, seeing performance degradation with the built-in effect:

     Benchmark Mode Samples Mean Mean error Units osIntStr.inlineSideEffect avgt 25 84.953 2.274 ns/op osIntStr.spliceSideEffect avgt 25 65.386 1.194 ns/op 
  • Submit a bug report ( https://bugs.openjdk.java.net/browse/JDK-8043677 ) to discuss this behavior with the VM guys. The rationale for the original fix is ​​solid, but it is interesting if we can / should return this optimization in some trivial cases like these.

  • ???

  • PROFIT.

And yes, I have to post the results for a test that moves the increment from the StringBuilder chain, doing this in front of the whole chain. Also switches to average time and ns / op. This is the JDK 7u55:

 Benchmark Mode Samples Mean Mean error Units osIntStr.integerToString avgt 25 153.805 1.093 ns/op osIntStr.stringBuilder0 avgt 25 128.284 6.797 ns/op osIntStr.stringBuilder1 avgt 25 131.524 3.116 ns/op osIntStr.stringBuilder2 avgt 25 254.384 9.204 ns/op osIntStr.stringFormat avgt 25 2302.501 103.032 ns/op 

And this is 8u5:

 Benchmark Mode Samples Mean Mean error Units osIntStr.integerToString avgt 25 153.032 3.295 ns/op osIntStr.stringBuilder0 avgt 25 127.796 1.158 ns/op osIntStr.stringBuilder1 avgt 25 131.585 1.137 ns/op osIntStr.stringBuilder2 avgt 25 250.980 2.773 ns/op osIntStr.stringFormat avgt 25 2123.706 25.105 ns/op 

stringFormat is actually a bit faster in 8u5, and all other tests are the same. This hardens the hypothesis about the termination of a side effect in the SB chains of the main culprit in the original question.

+94
May 21 '14 at 19:23
source share

I think this is due to the CompileThreshold flag, which controls when the byte code is compiled into native JIT code.

The Oracle JDK has a default value of 10,000 as a document at http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html .

Where OpenJDK I could not find the last document of this flag; but some mail threads offer a much lower threshold: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-November/004239.html

Also, try turning on / off the Oracle JDK flags, for example, -XX:+UseCompressedStrings and -XX:+OptimizeStringConcat . I am not sure if these flags are enabled by default in OpenJDK. Maybe someone can suggest.

One experiment you can do is, first, run the program many times, say, 30,000 cycles, do System.gc (), and then try to look at the performance. I believe they will do the same.

And I suppose your GC setup too. Otherwise, you allocate many objects, and GC could very well be the main part of your runtime.

+5
May 20 '14 at 10:36
source share



All Articles