Java implementation "for" does not allow garbage collection

UPD 11/21/2017: bug fixed in JDK, see comment from Vicente Romero

Summary:

If the for statement, if it is used for any Iterable implementation, the collection will remain in the memory heap until the end of the current area (method, operator body) and will not be garbage collected even if you have other references to the collection and the application should allocate new memory .

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8175883

https://bugs.openjdk.java.net/browse/JDK-8175883

Example:

If I have the following code that selects a list of large lines with random content:

 import java.util.ArrayList; public class IteratorAndGc { // number of strings and the size of every string static final int N = 7500; public static void main(String[] args) { System.gc(); gcInMethod(); System.gc(); showMemoryUsage("GC after the method body"); ArrayList<String> strings2 = generateLargeStringsArray(N); showMemoryUsage("Third allocation outside the method is always successful"); } // main testable method public static void gcInMethod() { showMemoryUsage("Before first memory allocating"); ArrayList<String> strings = generateLargeStringsArray(N); showMemoryUsage("After first memory allocation"); // this is only one difference - after the iterator created, memory won't be collected till end of this function for (String string : strings); showMemoryUsage("After iteration"); strings = null; // discard the reference to the array // one says this doesn't guarantee garbage collection, // Oracle says "the Java Virtual Machine has made a best effort to reclaim space from all discarded objects". // but no matter - the program behavior remains the same with or without this line. You may skip it and test. System.gc(); showMemoryUsage("After force GC in the method body"); try { System.out.println("Try to allocate memory in the method body again:"); ArrayList<String> strings2 = generateLargeStringsArray(N); showMemoryUsage("After secondary memory allocation"); } catch (OutOfMemoryError e) { showMemoryUsage("!!!! Out of memory error !!!!"); System.out.println(); } } // function to allocate and return a reference to a lot of memory private static ArrayList<String> generateLargeStringsArray(int N) { ArrayList<String> strings = new ArrayList<>(N); for (int i = 0; i < N; i++) { StringBuilder sb = new StringBuilder(N); for (int j = 0; j < N; j++) { sb.append((char)Math.round(Math.random() * 0xFFFF)); } strings.add(sb.toString()); } return strings; } // helper method to display current memory status public static void showMemoryUsage(String action) { long free = Runtime.getRuntime().freeMemory(); long total = Runtime.getRuntime().totalMemory(); long max = Runtime.getRuntime().maxMemory(); long used = total - free; System.out.printf("\t%40s: %10dk of max %10dk%n", action, used / 1024, max / 1024); } } 

compile and run it with limited memory , for example: (180mb):

 javac IteratorAndGc.java && java -Xms180m -Xmx180m IteratorAndGc 

and at runtime I:

Before the first memory allocation: 1251k max. 176640k

After the first memory allocation: 131426k max. 176640k

After iteration: 131426k max. 176640k

After the GC force in the method enclosure: 110682k max. 176640k (almost nothing assembled)

Try allocating memory in the method body again:

  !!!! Out of memory error !!!!: 168948k of max 176640k 

GC after body method: 459k max. 176640k (garbage collection!)

The third allocation outside the method is always successful: 117740k max. 163840k

So, inside gcInMethod (), I tried to select the list, iterate over it, drop the link to the list, (optionally) forcibly collect garbage and add a similar list again. But I can not allocate a second array due to lack of memory.

At the same time, outside the function body, I can successfully forcibly collect garbage collection (optional) and redistribute the same array size again!

To avoid this OutOfMemoryError inside the function body, it is enough to delete / comment only one line:

for (String string : strings); <- this is evil !!!

and then the output is as follows:

Before the first memory allocation: 1251k max. 176640k

After the first memory allocation: 131409k max. 176640k

After iteration: 131409k max. 176640k

After GC strength in the method enclosure: 497k max. 176640k (garbage collection!)

Try allocating memory in the method body again:

After allocating secondary memory: 115541k max. 163840k

GC after the method body: 493k max. 163840k (garbage collection!)

The third allocation outside the method is always successful: 121300k max. 163840k

Thus, without repeating the iteration of garbage that was successfully collected after dropping the link to the lines, and allocated a second time (inside the function body) and allocated a third time (outside the method).

My suggestion:

to build the syntax is compiled into

 Iterator iter = strings.iterator(); while(iter.hasNext()){ iter.next() } 

(and I checked this decompilation javap -c IteratorAndGc.class )

And it looks like this iterative link remains in the scope to the end. You do not have access to the link to nullify it, and the GC cannot complete the collection.

Perhaps this is normal behavior (maybe even specified in javac, but I didn’t find it), but IMHO, if the compiler creates some instances, it should take care to drop them out of scope after use.

The way I expect the implementation of the for statement is:

 Iterator iter = strings.iterator(); while(iter.hasNext()){ iter.next() } iter = null; // <--- flush the water! 

The java compiler and execution versions are used:

 javac 1.8.0_111 java version "1.8.0_111" Java(TM) SE Runtime Environment (build 1.8.0_111-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode) 

Note :

  • the question is not about the programming style, best practices, conventions, etc., it is about the effectiveness of the Java platform.

  • the question is not about the behavior of System.gc() (you can remove all gc calls from the example) - during allocation of the second line, the JVM should free up memory with disk space.

Link to java test class , Online compiler for testing (but this resource has only 50 MB heap, so use N = 5000)

+7
java garbage-collection iterator memory-management for-loop
source share
6 answers

Finally, the Oracle / Open JKD error is accepted and approved (not yet fixed):

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8175883

https://bugs.openjdk.java.net/browse/JDK-8175883

Output comments from streams:

This is a problem reproducible on both 8 and 9

There is some problem when a program stores its own implicit auto-generated memory block reference until the next implicit use and its memory is locked, which causes OOM

(this proves @vanza's expectations , see this example from the JDK developer )

According to the specification, this should not be.

(this is the answer to my question: if the compiler creates some instances, it should take care to drop them from the scope after use)

UPD 11/21/2017: bug fixed in JDK, see comment from Vicente Romero

0
source share

The only relevant part of the extended for statement here is an additional local object reference.

Your example can be reduced to

 public class Example { private static final int length = (int) (Runtime.getRuntime().maxMemory() * 0.8); public static void main(String[] args) { byte[] data = new byte[length]; Object ref = data; // this is the effect of your "foreach loop" data = null; // ref = null; // uncommenting this also makes this complete successfully byte[] data2 = new byte[length]; } } 

This program will also fail with OutOfMemoryError . If you delete the ref declaration (and initialize it), it will succeed.

The first thing you need to understand is that the area has nothing to do with garbage collection. Scope is a compile-time concept that defines where identifiers and names in the program source code can be used to refer to program objects.

Garbage collection is controlled by reachability. If the JVM can determine which object cannot be accessed by any potential ongoing computations from any live thread, then it will assume that it has the right to garbage collection. In addition, System.gc() useless since the JVM will run a large collection if it cannot find a place to host the new object.

So the question is: why can the JVM not determine that the byte[] object is no longer accessed if we store it in a second local variable?

I have no answer. Different garbage collection algorithms (and JVMs) can behave differently in this regard. It seems that the JVM does not put the object as inaccessible when the second entry in the local variable table has a reference to this object.


Here's another scenario in which the JVM did not behave exactly as you expected from the migration regarding garbage collection:

  • OutOfMemoryError when an insecure code block is commented out
+5
source share

So this is a really interesting question that could benefit from a slightly different wording. More specifically, focusing on the generated bytecode instead would eliminate a lot of confusion. So do it.

Given this code:

 List<Integer> foo = new ArrayList<>(); for (Integer i : foo) { // nothing } 

This is the generated bytecode:

  0: new #2 // class java/util/ArrayList 3: dup 4: invokespecial #3 // Method java/util/ArrayList."<init>":()V 7: astore_1 8: aload_1 9: invokeinterface #4, 1 // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator; 14: astore_2 15: aload_2 16: invokeinterface #5, 1 // InterfaceMethod java/util/Iterator.hasNext:()Z 21: ifeq 37 24: aload_2 25: invokeinterface #6, 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object; 30: checkcast #7 // class java/lang/Integer 33: astore_3 34: goto 15 

So play the game:

  • Save the new list in local variable 1 ("foo")
  • Store the iterator in local variable 2
  • For each item, save the item in local variable 3

Please note that after the cycle there is no cleaning of everything that was used in the cycle. This is not limited to the iterator: the last element is still stored in local variable 3 after the end of the loop, even if the code does not have a reference to it.

So, before we go “what's wrong, wrong, wrong,” let's see what happens when I add this code after this code above:

 byte[] bar = new byte[0]; 

You get this bytecode after the loop:

  37: iconst_0 38: newarray byte 40: astore_2 

Oh look at that. The newly declared local variable is stored in the same "local variable" as the iterator. So now the link to the iterator has disappeared.

Please note that this is different from Java code that you consider equivalent. The actual Java equivalent that generates the same bytecode is this:

 List<Integer> foo = new ArrayList<>(); for (Iterator<Integer> i = foo.iterator(); i.hasNext(); ) { Integer val = i.next(); } 

And still no cleaning. Why?

Well, here we are in the guessing zone, unless it is specified in the JVM specification (not verified). In any case, in order to perform the cleaning, the compiler would have to generate additional bytecode (2 commands, aconst_null and astore_<n> ) for each variable that goes out of scope. This would mean that the code is slower; and to avoid this, perhaps complex optimizations should be added to the JIT.

So why is your code not working?

You are in a similar situation, as indicated above. The iterator is allocated and stored in local variable 1. Then your code tries to allocate a new array of strings and, since local variable 1 is no longer used, it will be stored in the same local variable (check the byte code). But the distribution happens before the destination, so there is a link to the iterator, so there is no memory.

If you add this line before the try block, everything will work, even if you delete the System.gc() call:

 int i = 0; 

So, it seems that the JVM developers made a choice (generate a smaller / more efficient bytecode instead of explicitly nulling the variables that go beyond), and you seem to have written code that doesn’t work very well according to their assumptions about how people write code. Given that I have never seen this problem in real applications, it seems insignificant to me.

+4
source share

As already indicated in other answers, the concept of variable regions is not known at run time. In compiled class files, local variables represent only the places in the stack frame (addressed by the index) to which writes and reads are performed. If several variables have areas of disjunction, they can use the same index, but there is no formal declaration of them. Only writing a new value discards the old.

So, there are three ways that a link stored in a local variable store can be considered unused:

  • The storage location will be overwritten with the new value.
  • The method completes
  • The following code does not read the value

It should be obvious that the third point is more difficult to verify, therefore, it is not always applied, but when the optimizer starts its work, it can lead to surprises in the other direction, as explained in " Can java complete the object while it is still in the area visibility? and finalize () called for a highly reachable object in Java 8. "

In your case, the application runs very short and is probably not optimized, which can lead to links not being recognized as unused due to point 3, when points 1 and 2 are not applied.

You can easily see that it is. When you change the line

 ArrayList<String> strings2 = generateLargeStringsArray(N); 

to

 ArrayList<String> strings2 = null; strings2 = generateLargeStringsArray(N); 

OutOfMemoryError leaves. The reason is that the storage location containing the Iterator used in the previous for loop was not overwritten at this point. The new local variable strings2 will reuse storage, but this only appears when a new value is actually written to it. Thus, initializing with null before calling generateLargeStringsArray(N) will overwrite the Iterator link and allow you to collect the old list.

Alternatively, you can run the program in its original form using the -Xcomp option. This leads to compilation of all methods. On my machine, it had a noticeable slowdown in loading, but due to analysis of variable usage, OutOfMemoryError also went away.

Having an application that allocates so much memory (compared to the maximum heap size) during initialization, i.e. when most methods are interpreted is an unusual corner case. Usually most of the hot methods are compiled enough before the memory consumption is high. If you repeatedly come across this corner case in a real application, then -Xcomp may work for you.

+3
source share

Thanks for the bug report. We fixed this error, see JDK-8175883 . As noted here, in the case of extended for , javac generated synthetic variables, so for code like:

 void foo(String[] data) { for (String s : data); } 

javac roughly generated:

 for (String[] arr$ = data, len$ = arr$.length, i$ = 0; i$ < len$; ++i$) { String s = arr$[i$]; } 

as mentioned above, this approach to translation implies that the arr $ synthetic variable contains a reference to a data array that prevents GC from collecting the array if it is not passed anymore inside the method. This error was fixed by creating this code:

 String[] arr$ = data; String s; for (int len$ = arr$.length, i$ = 0; i$ < len$; ++i$) { s = arr$[i$]; } arr$ = null; s = null; 

The idea is to set a null synthetic variable of the reference type created by javac to translate the loop. If we were talking about an array of primitive type, then the last null assignment is not generated by the compiler. The bug was fixed in the JDK repo repo

+2
source share

Just to summarize the answers:

As @ sotirios-delimanolis mentioned in his comment about the extension for the operator , my assumption is clearly defined: the for sugar statement is compiled in Iterator with calls to hasNext() - next() :

#i is an automatically generated identifier different from any other identifiers (automatically generated or others) that are in scope ( §6.3 ) at the point where the extension for approval occurs.

As @vanza then showed in its answer : this automatically generated identifier may or may not be revoked later. If it is redefined, memory can be freed; if not, memory is no longer freed.

However (for me) there is an open question: if the Java compiler or the JVM creates some implicit links, should you then forget about these links? Is there any guarantee that the same auto-generated iterator reference will be reused on subsequent calls before the next memory allocation ? Shouldn't that be the rule: those who allocate memory care about freeing it? I would say - he should take care of this. Otherwise, the behavior is undefined (it may fall to OutOfMemoryError or may not - who knows ...)

Yes, my example is an angular case (nothing is initialized between the for iterator and the next memory allocation), but this does not mean that this is not possible. And this does not mean that this case is difficult to achieve - it is likely that it works in a limited memory environment with some big data and redistributes the memory immediately as it was used. I found this case in my working application, where I parse large XML that "eats" more than half of the memory.

(and the question is not only about the iterator and for loops, suppose this is a common problem: the compiler or the JVM sometimes do not clear their own implicit links).

0
source share

All Articles