Java 7 String - substring complexity

Question

Java 7 String - substring complexity

Prior to Java 6, we had a constant substring of time on a String . In Java 7, why did they decide to go with copying the char array and humiliate them to linear time complexity when something like StringBuilder meant to do just that?

+25

java java-7

anoopelias Apr 20 '13 at 17:55

source share

5 answers

If you have a long residential small substring of a short-lived large parent string, then a large char [] supporting the parent string will not have the right to garbage collection until the small substring goes beyond. This means that a substring can take up much more memory than people expect.

The only time the Java 6 method performed much better was when someone took a large substring from a large parent string, which is very rare.

It is understood that they decided that the tiny operating costs of this change were outweighed by the latent memory problems caused by the old way. The determining factor is that the problem was hidden, not a workaround.

+8

ILMTitan Apr 20 '13 at 18:07

source share

This will affect the complexity of data structures, such as suffix arrays, with a fair share. Java should provide an alternative method to get part of the source string.

+5

Heisenberg Feb 17 '14 at 13:53

source share

It's just their crappy way to set some restrictions on JVM garbage collection.

Before Java 7, if we want to avoid a broken garbage collection problem, we can always copy the substring instead of saving the substring reference. It was just an extra call to the copy constructor:

 String smallStr = new String(largeStr.substring(0,2));

But now we no longer have a substring with constant time. What a disaster.

+3

Alex Apr 18 '15 at 5:29

source share

The main motivation, I believe, is the "co-location" of String and its char[] . Right now they are located at a distance, which is a serious penalty in the cache lines. If each String its own char[] , the JVM can combine them together, and reading will be much faster.

+1

ZhongYu Jun 09 '15 at 18:52

source share

Andy Thomas · Accepted Answer · 2013-04-20 18:21

Why they decided to discuss in Oracle Error # 4513622: (str) saving a substring of a field prevents the GC for the object :

When you call String.substring, as in the example, a new array of characters is not allocated for storage. It uses an array of characters from the source string. Thus, the character array supporting the source string cannot be GC'd until the substring references are also GC'd. This is deliberate optimization to prevent excessive allocations when using a substring in common scenarios. Unfortunately, the problematic code falls into the case when the noticeable overhead of the source array is noticeable. It is difficult to optimize for both edge cases. Any optimization for space / size tradeoffs is usually complex and can often be platform specific.

Also this note , noting that once optimization became pessimization according to tests:

For a long time, they were prepared and planed to remove the offset and count fields from java.lang.String. These two fields allow multiple String instances to share the same base character buffer. Shared character buffers were an important optimization for old tests, but with the current real-world code and references, it is actually better not to separate the buffer buffers. The general char buffer supporting the array only wins with very heavy use of String.substring. In a situation with a negative impact, parsers and compilers may be present, however, current testing shows that in general this change is beneficial.

Java 7 String - substring complexity

More articles: