Why / when you do not want to use Java 8 UseStringDeduplication in the JVM?

Java 8 introduced string deduplication, which can be enabled by starting the JVM with the -XX:+UseStringDeduplication , which saves some memory by referring to similar String objects rather than duplication. Of course, the effectiveness varies from program to program depending on the use of Strings , but I think it is safe to say that in general it can be considered useful for most applications (if not for all), making me think about several things:

Why is it not enabled by default? Is it due to the cost of deduplication, or simply because the G1GC is still considered new?

Are there (or may be) any boundary cases where you do not want to use deduplication?

+21
java performance optimization java-8
source share
2 answers

Cases where row deduplication can be harmful include:

  • There are many lines, but the likelihood of duplicates is very low: the time spent on finding duplicates and the spatial costs of the deduplication data structure will not be returned.
  • There is a reasonable likelihood of duplicates, but most rows still die within a couple of GC 1 cycles. Deduplication is less beneficial if, in any case, GD'-duplicated rows are soon removed.

    (This is not about lines that cannot withstand the first GC loop. For the GC, it makes no sense to even try to de-duplicate the lines that he knows are garbage.)

We can only speculate on why the Java team did not enable deduplication by default, but they are in a much better position to make rational (i.e., evidence-based) decisions on this issue that you and I. I understand that they have access to many large real applications for comparing / testing optimization effects. They may also have contacts in partner or client organizations with similarly large code bases and concerns about efficiency ... whom they may ask for feedback on whether optimizations work in the early release release properly.

1 - It depends on the value of StringDeduplicationAgeThreshold JVM StringDeduplicationAgeThreshold . By default, this value is 3, meaning that (approximately) the row must survive 3 minor collections or the main collection, which should be considered to eliminate duplication. But in any case, if a row is de-duplicated and soon after that is declared unavailable, the deduplication overhead will not be repaid for that row.


If you are asking when to consider enabling deduplication, I would advise you to try and see if this helps for each application. But you need to do some testing at the application level (which takes effort!) To be sure that eliminating duplication is useful ...

A careful reading of JEP 192 will also help you understand the problems and decide how they can be applied to your Java application.

+24
source share

I absolutely understand that this does not answer the question, I just wanted to mention that jdk-9 introduces another optimization, which is called by default:

-XX: + CompactStrings

where Latin1 characters occupy one byte instead of two (via char). Because of this, many internal String methods have changed - they act the same for the user, but internally they are faster in many cases.

Also in the case of strings, to combine two strings together through a plus sign, javac is about to generate another bytecode.

There is no bytecode instruction that concatenates two lines together, so javac will generate

StringBuilder # Append

in the background. Before jdk-9.

Now bytecode delegates

StringConcatFactory # makeConcatWithConstants

or

StringConcatFactory # makeConcat

using the invokedynamic bytecode command:

  aload_0 1: aload_2 2: aload_1 3: invokedynamic #8, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 8: areturn 

Like two concatenated strings are the runtime solution now. it can still be a StringBuilder or it can be a concatenation of byte arrays, etc. All you know is that this can change, and you will get the fastest solution.

EDIT

I just debugged and saw that there are quite a few strategies for adding these lines:

  private enum Strategy { /** * Bytecode generator, calling into {@link java.lang.StringBuilder}. */ BC_SB, /** * Bytecode generator, calling into {@link java.lang.StringBuilder}; * but trying to estimate the required storage. */ BC_SB_SIZED, /** * Bytecode generator, calling into {@link java.lang.StringBuilder}; * but computing the required storage exactly. */ BC_SB_SIZED_EXACT, /** * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}. * This strategy also tries to estimate the required storage. */ MH_SB_SIZED, /** * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}. * This strategy also estimate the required storage exactly. */ MH_SB_SIZED_EXACT, /** * MethodHandle-based generator, that constructs its own byte[] array from * the arguments. It computes the required storage exactly. */ MH_INLINE_SIZED_EXACT } 

Default value:

MH_INLINE_SIZED_EXACT

+18
source share

All Articles