Can java.lang.String.concat be improved?

I am considering sending RFE (request for promotion) to the Oracle Bug database, which should significantly increase string concatenation performance. But before I do this, I would like to hear expert comments about whether it makes sense.

The idea is based on the fact that the existing String.concat (String) is twice as fast on 2 lines than StringBuilder. The problem is that there is no way to concatenate 3 or more lines. External methods cannot do this because String.concat uses the private package constructor String(int offset, int count, char[] value) , which does not copy the char array, but uses it directly. This provides String.concat high performance. Being in the same package, StringBuilder still cannot use this constructor because the String char array will be displayed for modifications.

I suggest adding the following methods to String

 public static String concat(String s1, String s2) public static String concat(String s1, String s2, String s3) public static String concat(String s1, String s2, String s3, String s4) public static String concat(String s1, String s2, String s3, String s4, String s5) public static String concat(String s1, String... array) 

Note: this type of overload is used in EnumSet.of for efficiency.

This is an implementation of one of the methods, others work the same way.

 public final class String { private final char value[]; private final int count; private final int offset; String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; } public static String concat(String s1, String s2, String s3) { char buf[] = new char[s1.count + s2.count + s3.count]; System.arraycopy(s1.value, s1.offset, buf, 0, s1.count); System.arraycopy(s2.value, s2.offset, buf, s1.count, s2.count); System.arraycopy(s3.value, s3.offset, buf, s1.count + s2.count, s3.count); return new String(0, buf.length, buf); } 

Also, after adding these methods to String, the Java compiler for

 String s = s1 + s2 + s3; 

will be able to create effective

 String s = String.concat(s1, s2, s3); 

instead of the current ineffective

 String s = (new StringBuilder(String.valueOf(s1))).append(s2).append(s3).toString(); 

UPDATE Performance Test. I ran it on my Intel Celeron 925 laptop, combining 3 lines, my String2 class emulates exactly how it will be in real java.lang.String. String lengths are chosen so that the StringBuilder is in the most adverse conditions, that is, when it needs to expand its internal buffer capacity on each addition, while concat always creates char [] only once.

 public class String2 { private final char value[]; private final int count; private final int offset; String2(String s) { value = s.toCharArray(); offset = 0; count = value.length; } String2(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; } public static String2 concat(String2 s1, String2 s2, String2 s3) { char buf[] = new char[s1.count + s2.count + s3.count]; System.arraycopy(s1.value, s1.offset, buf, 0, s1.count); System.arraycopy(s2.value, s2.offset, buf, s1.count, s2.count); System.arraycopy(s3.value, s3.offset, buf, s1.count + s2.count, s3.count); return new String2(0, buf.length, buf); } public static void main(String[] args) { String s1 = "1"; String s2 = "11111111111111111"; String s3 = "11111111111111111111111111111111111111111"; String2 s21 = new String2(s1); String2 s22 = new String2(s2); String2 s23 = new String2(s3); long t0 = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) { String2 s = String2.concat(s21, s22, s23); // String s = new StringBuilder(s1).append(s2).append(s3).toString(); } System.out.println(System.currentTimeMillis() - t0); } } 

at 1,000,000 iterations results:

 version 1 = ~200 ms version 2 = ~400 ms 
+8
java string string-concatenation
source share
3 answers

Fact are those use cases for which the performance of a single string concatenation expression is important. In most cases, when performance is tied to string concatenation, this happens in a loop, creating the final product in stages, and in this context the mutable StringBuilder is the clear winner. That is why I do not see much promise for a proposal that optimizes the minority problem by entering the fundamental String class. But in any case, as far as comparing performance, your approach has a significant advantage:

 import com.google.caliper.Runner; import com.google.caliper.SimpleBenchmark; public class Performance extends SimpleBenchmark { final Random rnd = new Random(); final String as1 = "aoeuaoeuaoeu", as2 = "snthsnthnsth", as3 = "3453409345"; final char[] c1 = as1.toCharArray(), c2 = as2.toCharArray(), c3 = as3.toCharArray(); public static char[] concat(char[] s1, char[] s2, char[] s3) { char buf[] = new char[s1.length + s2.length + s3.length]; System.arraycopy(s1, 0, buf, 0, s1.length); System.arraycopy(s2, 0, buf, s1.length, s2.length); System.arraycopy(s3, 0, buf, s1.length + s2.length, s3.length); return buf; } public static String build(String s1, String s2, String s3) { final StringBuilder b = new StringBuilder(s1.length() + s2.length() + s3.length()); b.append(s1).append(s2).append(s3); return b.toString(); } public static String plus(String s1, String s2, String s3) { return s1 + s2 + s3; } public int timeConcat(int reps) { int tot = rnd.nextInt(); for (int i = 0; i < reps; i++) tot += concat(c1, c2, c3).length; return tot; } public int timeBuild(int reps) { int tot = rnd.nextInt(); for (int i = 0; i < reps; i++) tot += build(as1, as2, as3).length(); return tot; } public int timePlus(int reps) { int tot = rnd.nextInt(); for (int i = 0; i < reps; i++) tot += plus(as1, as2, as3).length(); return tot; } public static void main(String... args) { Runner.main(Performance.class, args); } } 

Result:

  0% Scenario{vm=java, trial=0, benchmark=Concat} 65.81 ns; σ=2.56 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Build} 102.94 ns; σ=2.27 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=Plus} 160.14 ns; σ=2.94 ns @ 10 trials benchmark ns linear runtime Concat 65.8 ============ Build 102.9 =================== Plus 160.1 ============================== 
+7
source share

If you want them to take you seriously, you need to do the hard work of fully implementing, testing and thoroughly evaluating your proposed changes. And a full implementation would include changes to the Java compiler to emit bytecodes for using your methods.

Record the results, and then submit the code changes as a patch in OpenJDK 7 or 8.

I got the impression that Java developers don't have the resources to try out speculative ideas for optimizations like this. RFE without benchmarking results and code patches is unlikely to get attention ...

+4
source share

Always always ask them, do not worry.

I would not have so many overloaded versions. In EnumSet, persistence can be significant; hardly like that in String.

In fact, I think a static method allowing any number of arguments is better

  public static String join(String... strings) 

since the number of arguments may not be known at compile time.

+1
source share

All Articles