I suggest
- splitting of each cycle into a separate method, therefore optimization of one does not affect the other.
- ignore the first 10K iterations
- run the test for at least 2 seconds.
- run the test several times to ensure reproducibility.
When you run the code less than 10,000 times, it cannot generate code that will be compiled as the default value -XX:CompileThreshold=10000 . Part of this is collecting statistics on how best to optimize the code. However, when the loop starts compilation, it runs it for the whole method, which can make subsequent loops or look a) better, since they were compiled before they started. B) worse because they are compiled without statistics.
Consider the following code
public static void main(String... args) { int runs = 1000; for (int i = 0; i < runs; i++) String.valueOf(i); System.out.printf("%-10s%-10s%-10s%-9s%-9s%n", "+ oper", "SBuffer", "SBuilder", "+/Buff", "Buff/Builder"); for (int t = 0; t < 5; t++) { long sConcatTime = timeStringConcat(runs); long sBuffTime = timeStringBuffer(runs); long sBuilderTime = timeStringBuilder(runs); System.out.printf("%,7dns %,7dns %,7dns ", sConcatTime / runs, sBuffTime / runs, sBuilderTime / runs); System.out.printf("%8.2f %8.2f%n", (double) sConcatTime / sBuffTime, (double) sBuffTime / sBuilderTime); } } public static double dontOptimiseAway = 0; private static long timeStringConcat(int runs) { long sConcatStart = System.nanoTime(); for (int j = 0; j < 100; j++) { String s = ""; for (int i = 0; i < runs; i += 100) { s += String.valueOf(i); } dontOptimiseAway = Double.parseDouble(s); } return System.nanoTime() - sConcatStart; } private static long timeStringBuffer(int runs) { long sBuffStart = System.nanoTime(); for (int j = 0; j < 100; j++) { StringBuffer buff = new StringBuffer(); for (int i = 0; i < runs; i += 100) buff.append(i); dontOptimiseAway = Double.parseDouble(buff.toString()); } return System.nanoTime() - sBuffStart; } private static long timeStringBuilder(int runs) { long sBuilderStart = System.nanoTime(); for (int j = 0; j < 100; j++) { StringBuilder buff = new StringBuilder(); for (int i = 0; i < runs; i += 100) buff.append(i); dontOptimiseAway = Double.parseDouble(buff.toString()); } return System.nanoTime() - sBuilderStart; }
prints with runs = 1000
+ oper SBuffer SBuilder +/Buff Buff/Builder 6,848ns 3,169ns 3,287ns 2.16 0.96 6,039ns 2,937ns 3,311ns 2.06 0.89 6,025ns 3,315ns 2,276ns 1.82 1.46 4,718ns 2,254ns 2,180ns 2.09 1.03 5,183ns 2,319ns 2,186ns 2.23 1.06
however, if you increase the number of runs = 10,000
+ oper SBuffer SBuilder +/Buff Buff/Builder 3,791ns 400ns 357ns 9.46 1.12 1,426ns 139ns 113ns 10.23 1.23 323ns 141ns 117ns 2.29 1.20 317ns 115ns 78ns 2.76 1.47 317ns 127ns 103ns 2.49 1.23
and if we increase the runs to 100,000, I get
+ oper SBuffer SBuilder +/Buff Buff/Builder 3,946ns 195ns 128ns 20.23 1.52 2,364ns 113ns 86ns 20.80 1.32 2,189ns 142ns 95ns 15.34 1.49 2,036ns 142ns 96ns 14.31 1.48 2,566ns 114ns 88ns 22.46 1.29
Note. Operation + slowed down because the time complexity of the cycle is O (N ^ 2)