Hashset. slow performance in a large set

I ran into a problem, I can not find a solution. I use a HashSet to store values. The values ​​that I store are user-type loops in which I overridden HashCode and equal to the following to make sure that slow performance is not resolved using hascode or equal methods. I also set the hash initial capacity to 10.000.000

@Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + (int) (cycleId ^ (cycleId >>> 32)); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Cycle other = (Cycle) obj; if (cycleId != other.cycleId) return false; return true; } 

After the first 1,500,000 first values, when I try to add a new value (using the add method of the HashSet class), the program runs very slowly. In the end, I will get an exception from memory from memory (exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space) before the stored values ​​reach 1.600.000

In the IDE, I use Eclipse. So, the next step was to increase the JVM heap size from the default to 1 gig (using commnads Xmx1000M and Xms1000M) Now elipse starts with 10 times more available memory (I see that in the bottom right where the total memory size is displayed heaps and used memory), but again I have the same "slow" performance and the same memory error. IN THE SAME VALUES is still (after 1.500.000 and up to 1.600.000), which is very strange.

Does anyone have an idea that this could be a problem?

Thank you in advance

+6
java performance hashset
source share
9 answers

You do not want to increase the JVM heap for Eclipse, you want to install it for your program.

Go to Run> Run Configurations (or Debug Configurations ) and set Virtual Machine Settings .

+10
source share

There is not enough heap memory (increase it with -Xmx, e.g. -Xmx512m ). When free memory goes very low, a lot of time is spent on the garbage collector, which violently scans the heap for unreachable objects.

Your hashCode () is fine, extra points to use all bits of cycleId long.

Edit Now I saw that you increased your memory and did not help. First of all, are you sure that you managed to increase your memory? You can check this on jconsole, connect to your application and see its heap size.

To find an alternative explanation, is there any specific pattern in your cycleId that can make this hashCode () implementation bad? For example, its 32 high order bits are mostly similar to the 32 low order bits. (Yes, right).

But no. Even if this is the case, you will see a gradual decrease in performance, rather than a sharp drop to a certain point (and you will get an OutOfMemoryError and frenzy gc operation). Therefore, my best guess is still a memory problem. You either did not increase the heap size as you thought , or at some point there is some other memory. (You can use a tool like VisualVM to profile it and get a heap dump on OOME and see what objects it contains).

Edit2 I highlighted the correct part above.

+4
source share

The memory size available for an application running with Eclipse must be configured from the Run menu. Try:

Run → Run Configurations → Arguments → VM Arguments → -Xmx1000M

The reason your program is slow is the garbage collector - it runs every time the memory is not in the limit.

+2
source share

Have you tested the implementation of the hashCode method? it always returns 31 for any value of circleId . It’s not strange that your HashMap is slow, it has linear performance.

+2
source share

If you want to increase the amount of memory that your program can use, this will not help to increase the size of the Eclipse heap. You must put the parameter in the vm parameters of your program startup configuration.

+1
source share

The JVM issues "out of memory" NOT based on available memory. It rushes when the time spent on garbage collection is too long. check it out . The exact implementation details depend on the JVM and the garbage collector implementation.

An increase in memory would not help in this case. You may need to choose a different approach.

+1
source share

Your computer may not have enough memory, so it should swap to disk.

0
source share

How do you initialize your HashSet ? You should know about his growth pattern. With each add operation, it checks to see if it is approaching capacity. If it reaches a certain point (determined by its "load factor"), it performs a resize operation, which can be expensive. From JavaDoc (from HashMap - a collection that supports HashSet ):

Typically, the default load factor (.75) provides a good compromise between time and space. Higher values ​​reduce the amount of overhead, but increase the cost of the search (reflected in most operations of the HashMap class, including get and put). The expected number of entries on the map and load factor should be taken into account when setting up the initial capacity in order to minimize the number of paraphrase operations. If the initial capacity is greater than the maximum number of records divided by the load factor, no rephrasing will ever occur.

0
source share

I am very disappointed with the number of answers explaining OP to increase the size of his heap in his application. This is not a solution - it is a quick and dirty patch that will not fix any problems.

I found this presentation extremely informative: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

Basically a page listing the minimum byte sizes of each blank -

 ArrayList: 40 or 48 LinkedList: 48 HashMap: 56 or 120 HashSet: 72 or 136 

It turns out that a HashSet is practically a HashMap, and (counterintuitively) takes up more memory, despite the fact that it only holds values ​​instead of key-value pairs.

0
source share

All Articles