Should I reset Java Explicit Space as much as possible after use?

I work with some modeling algorithms in R, one of which works in Java ( bartMachine ). I found that with the size of my data, I need to increase the maximum heap space for java before running the simulation algorithm.

I do it like this:

options(java.parameters = "-Xmx16g")

My question is: do I need after resetting the heap space if no other algorithm uses java (or at least it's a lot of heap)? Or will the memory allocated by java be fixed as needed without any performance loss?

I already searched around the topic and I understand how to change / omit a bunch of space. I also understand that R / Java will do garbage collection to remove old objects from memory to free up more space.

What I don’t understand is how changing the heap space affects the memory available for other programs, and whether or not it is a good idea to resize after the heap in this case.

Some answers / resources that I have already looked at:

Is there a way to reduce a bunch of Java when not in use?

Java garbage collector - When does it collect?

http://www.bramschoenmakers.nl/en/node/726

https://cran.r-project.org/web/packages/bartMachine/bartMachine.pdf

+6
source share
2 answers

The implementation is defined and depends on the implementation performed by many parameters. The garbage collector can affect it . On a Mac using Oracles JVM 1.7, the default collector is -XX:+UseParallelGC , and this collector does not free memory back to the OS. I tried this on a Mac and it didn’t release anything, but used -XX:+UseG1GC . You can see which default version is used for you:

 java -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -version 

There are several parameters that you can use to configure memory deallocation if you use a JVM that supports it and the correct garbage collector, i.e.

 -XX:MinHeapFreeRatio (default is 40) -XX:MaxHeapFreeRatio (default is 70) 

but they hit and miss (the JVM decides when it frees up memory, just releasing a ton of objects may not cause it).

+5
source

Recently, I have been working with a non-ML program that is very heavy on Java, and I feel your pain.

I can’t tell you if there is a reset of dynamically allocated memory based on one indisputable technical fact, but my personal experience tells me that if you are going to continue processing in the native R environment after working with Java, you probably should. It’s best to control what you can.

That's why:

The only time I ever ran out of memory (even working with flat MASSIVE files), I once used the JVM. This is not one time, it happened often.

It is even just reading and writing large excel files through XLConnect, which is controlled by Java; memory is quickly stuck. The failure seems to be in the way R and Java play with each other.

And, r does not automatically collect garbage the way you hope. It collects when the OS requests more memory, but everything can slow down long before this happens.

Also, R sees only objects in memory that it creates, and not those that it interprets, so your Java cube will be delayed without the knowledge of R. So if the JVM created it, R will not clear it if Java does not do it before not sleeping. And if the memory is selectively recycled, you may have fragmented memory gaps that greatly affect performance.

My personal approach was to create sets, variables, frames ... a subset just for what I need, and then rm() and gc() ... delete and force garbage collection.

Proceed to the next step and make a heavy lift. If I run a Java-based package, I will do this cleanup more often to keep the memory clean.

Once the Java process is done, I use detach(yourlibraryname) and gc() to clear everything.

If you adjusted the heaps, I would rewrite the redefinition here, reducing the allocation that you give to Javas dynamic memory, because R cannot return it if the Java virtual machine is still involved, but does not work as far as I can tell. Therefore, you must reset it and return R to use R. I think that ultimately it will benefit you with faster processing and less locks.

The best way to find out how this affects your system when you use it is to use the sys.time or proc.time function to see how long your script takes with or without forced garbage collection, seizures, squads and heap redistribution.

You can get a complete picture of how to do this here:

IDRE -UCLE proc.time functions

Hope this helps some!

+5
source

All Articles