This is a question that is difficult for us to understand. It is difficult to describe this with text, but I hope the gist will be understood.
I understand that the actual content of the string is enclosed in an internal char array. In normal cases, the size of the stored row heap will be 40 bytes plus the size of the character array. This is explained here . When a substring is called, the character array maintains a reference to the original string, and therefore the stored size of the character array can be much larger than the string itself.
However, when we profile memory usage using Yourkit or MAT, something strange happens. A string that refers to the stored size of the char array does not include the stored size of the character array.
An example could be the following (semi pseudo-code):
String date = "2011-11-33"; (24 bytes) date.value = char{1172}; (2360 bytes)
The stored string size is defined as 24 bytes without including the stored size of the character array. This may make sense if there are many references to the array of characters due to the many substring operations.
Now that this string is included in some type of collection, such as an array or list, then the saved size of this array will contain the saved size of all strings, including the saved size of the character array.
Then we have this situation:
Array retained size = 300 bytes array[0] = String 40 bytes; array[1] = String 40 bytes; array[1].value = char[] (220 bytes)
So you need to examine each array entry to try and determine where the size came from.
Again, this can be explained by the fact that the array contains all the strings that contain references to the same array of characters, and, therefore, the correct size of the array is correct.
Now we get to the problem.
I keep in a separate object a reference to the array that I discussed above, as well as to another array with the same lines. In both arrays, strings refer to the same array of characters. This is expected - we are talking about the same line. However, the saved size of this character array is taken into account for both arrays in this new object. In other words, the saved size seems double. If I delete the first array, the second array will still contain a reference to the character array and vice versa. This is confusing in the sense that Java seems to support two separate references to the same character array. How can it be? Is this a problem for java memory, or is it just the way profilers display information?
This problem caused a lot of headaches for us, trying to track the huge memory usage in our application.
Again - I hope someone there can understand the question and explain it.
thanks for the help