Java GC upper limit exceeded.

I evaluate various data from a text file in a fairly large algorithm.

If the text file contains more than datapoints (the minimum value I need is 1.3 million data), it gives the following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.regex.Matcher.<init>(Unknown Source) at java.util.regex.Pattern.matcher(Unknown Source) at java.lang.String.replaceAll(Unknown Source) at java.util.Scanner.processFloatToken(Unknown Source) at java.util.Scanner.nextDouble(Unknown Source) 

When I run it in Eclipse with the following settings for installed jre6 (standard virtual machine):

 -Xms20m -Xmx1024m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:NewSize=10m -XX:MaxNewSize=10m -XX:SurvivorRatio=6 -XX:TargetSurvivorRatio=80 -XX:+CMSClassUnloadingEnabled 

Please note that it works fine if I execute only part of a text file.

Now I read a lot about this topic, and it seems that somewhere I should have a data leak, or I am storing too much data in arrays (which, I think, is).

Now my problem is: how can I get around this?

  • Is it possible to change my settings so that I can still perform the calculation or do I really need more processing power? (don't know where to get this)
  • I read somewhere that it is better to use id and pointers for a processor than to put data in arrays and let it handle it. But how can I change my code so that it only displays pointers?

Basically I am looking for some general recommendations for preventing a huge amount of memory / memory leak.

+7
source share
3 answers

The real critical vm arg is -Xmx1024m , which tells the VM to use up to 1024 megabytes of memory. The simplest solution is to use a larger number there. You can try -Xmx2048m or -Xmx4096m , or any number if you have enough RAM on your computer to process.

I'm not sure if you get the most out of any of the other VM arguments. For the most part, if you tell Java how much space to use, it will be smart with the rest of the options. I suggest removing everything except the -Xmx and see how it works.

The best solution is to try to improve your algorithm, but I have not read it in sufficient detail to offer any suggestions.

+3
source

As you say, the data size is really very large, if it does not fit into the memory of one computer even after using the -Xmx jvm argument, then you can proceed to cluster computing using many computers running on your computer problem. To do this, you will need to use the messaging interface ( MPI ).

MPJ Express is a very good MPI implementation for Java, or in languages ​​like C / C ++, good implementations for MPI exist, such as Open MPI and mpich2 . I am not sure if it will help you in this situation, but it will certainly help you in future projects.

+3
source

I suggest you

  • use a profiler to minimize memory usage. I suspect you can reduce it by 10 times or more with primitives, binary data, and more compact collections.
  • Increase the amount of memory on your computer. The last time I tested hundreds of signals, I had 256 GB of main memory, and this was quite a bit. The more memory you can get, the better.
  • Use memory files to increase memory efficiency.
  • Reduce the size of your data set to the point where you can work with the machine and the program.
+1
source

All Articles