Saved row heap size in java

This is a question that is difficult for us to understand. It is difficult to describe this with text, but I hope the gist will be understood.

I understand that the actual content of the string is enclosed in an internal char array. In normal cases, the size of the stored row heap will be 40 bytes plus the size of the character array. This is explained here . When a substring is called, the character array maintains a reference to the original string, and therefore the stored size of the character array can be much larger than the string itself.

However, when we profile memory usage using Yourkit or MAT, something strange happens. A string that refers to the stored size of the char array does not include the stored size of the character array.

An example could be the following (semi pseudo-code):

String date = "2011-11-33"; (24 bytes) date.value = char{1172}; (2360 bytes) 

The stored string size is defined as 24 bytes without including the stored size of the character array. This may make sense if there are many references to the array of characters due to the many substring operations.

Now that this string is included in some type of collection, such as an array or list, then the saved size of this array will contain the saved size of all strings, including the saved size of the character array.

Then we have this situation:

 Array retained size = 300 bytes array[0] = String 40 bytes; array[1] = String 40 bytes; array[1].value = char[] (220 bytes) 

So you need to examine each array entry to try and determine where the size came from.

Again, this can be explained by the fact that the array contains all the strings that contain references to the same array of characters, and, therefore, the correct size of the array is correct.

Now we get to the problem.

I keep in a separate object a reference to the array that I discussed above, as well as to another array with the same lines. In both arrays, strings refer to the same array of characters. This is expected - we are talking about the same line. However, the saved size of this character array is taken into account for both arrays in this new object. In other words, the saved size seems double. If I delete the first array, the second array will still contain a reference to the character array and vice versa. This is confusing in the sense that Java seems to support two separate references to the same character array. How can it be? Is this a problem for java memory, or is it just the way profilers display information?

This problem caused a lot of headaches for us, trying to track the huge memory usage in our application.

Again - I hope someone there can understand the question and explain it.

thanks for the help

+7
source share
4 answers

I keep in a separate object a reference to the array that I discussed above, as well as to another array with the same lines. In both arrays, strings refer to the same array of characters. This is expected - we are talking about the same line. However, the saved size of this character array is taken into account for both arrays in this new object. In other words, the saved size seems double.

What you have here is the transition link in the domination tree :

enter image description here

An array of characters should not be displayed in the saved size of any array. If the profiler displays it in this way, then this is misleading.

Here's how JProfiler shows this situation in representing the largest objects:

enter image description here

The string instance contained in both arrays is displayed outside the array instances with the label [transient link]. If you want to examine the actual paths, you can add an array holder and a string to the graph and find all the paths between them:

enter image description here

Disclaimer: My company is developing JProfiler.

+4
source

I would say that this is how the profiler displays information. He does not suspect that these two arrays should be considered for "deduplication". How about you wrapping two arrays in some kind of dummy holder object and running your profiler against this? Then he should be able to follow the β€œdouble count”.

+3
source

If the strings are not interned, they can be equal() , but not == . When you create a String object from a char array, the constructor will create a copy of the char array. (This is the only way to protect an immutable string from later changes in char array values.)

0
source

If you use -XX:-UseTLAB

 public static void main(String... args) throws Exception { StringBuilder text = new StringBuilder(); text.append(new char[1024]); long free1 = free(); String str = text.toString(); long free2 = free(); String [] array = { str.substring(0, 100), str.substring(101, 200) }; long free3 = free(); if (free3 == free2) System.err.println("You must use -XX:-UseTLAB"); System.out.println("To create String with 1024 chars "+(free1-free2)+" bytes\nand to create an array with two sub-string was "+(free2-free3)); } private static long free() { return Runtime.getRuntime().freeMemory(); } 

prints

 To create String with 1024 chars 2096 bytes and to create an array with two sub-string was 88 

You can see how it consumes more memory that you would expect if they shared the same storage.

If you look at the code for the String class.

 public String substring(int start, int end) { // checks. return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex - beginIndex, value); } String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; } 

You can see that the substring for String does not accept a copy of the underlying array of values.


Another thing to consider is -XX:+UseCompressedStrings , which is enabled by default in newer versions of the JVM. This prompts the JVM to use bytes [] instead of char [], where possible.

The size of the headers for the String and array object varies for 32-bit JVMs, 64-bit JVMs with 32-bit links, and 64-bit JVMs with 64-bit links.

0
source

All Articles