Fastest Java HashSet <Integer> Library

In addition to this rather old post, I need something that will use primitives and give acceleration for an application containing a lot of HashSet from Integers

 Set<Integer> set = new HashSet<Integer>(); 

Thus, people mention libraries such as Guava, Javalution, Trove, but there is no perfect comparison in terms of tests and performance results, or at least not a good answer coming from good experience. From what I see, many recommend Trove TIntHashSet , but others say it's not so good; some say that Guava is supercooled and manageable, but I don't need beauty and maintainability, just runtime, so Python Guava style goes home :) Javalution? I visited the website, it seems too old for me and thus wacky.

The library should provide the best achievable time; memory doesn't matter.

Looking at “Thinking in Java,” there is the idea of ​​creating a custom HashMap with int[] as keys. So I would like to see something like this with a HashSet or just download and use the awesome library.

EDIT (in response to the comments below) Therefore, in my project I start with about 50 HashSet<Integer> collections, then I call the function about 1000 times, which internally creates up to 10 HashSet<Integer> collections. If I change the initial parameters, the numbers can grow exponentially. I only use the add() , contains() and clear() tags for these collections, so they were selected.

Now I'm going to find a library that implements a HashSet or something similar, but will do it faster due to using autoboxing Integer and maybe something else that I don’t know. In fact, I use ints when my data arrives and stores it in these HashSet s.

+7
source share
3 answers

Have you tried working with the parameters of the initial capacity and load factor when creating your HashSet?

HashSet Document

The initial capacity, you think, refers to how large the empty hashset will be when it is created, and the loadfactor is the threshold value that determines when the hash table needs to be grown. As a rule, you would like to keep the ratio between used buckets and total buckets below two thirds, which is considered the best ratio to achieve good stable work in the hash table.

Dynamic hash table recovery

Basically, try to set the initial capacity that will meet your needs (to avoid re-creating and reassigning the values ​​of the hash table as it grows), and also mess around with the load factor until you find a sweet spot.

It is possible that a lower loadfactor may help for your specific data distribution and setup / retrieval of values ​​(it is unlikely to be higher, but your level may vary).

0
source

Troy is a great choice.

The reason it is much faster than general collections is memory usage.

A java.util.HashSet<Integer> uses java.util.HashMap<Integer, Integer> internally. In a HashMap every object is contained in Entry<Integer, Integer> . These objects evaluate 24 bytes for Entry + 16 bytes for the actual integer + 4 bytes in the actual hash table. This gives 44 bytes, unlike 4 bytes in Trove, up to 11x memory overhead (note that unoccupied entries in the main table will have less difference in practice).

See also these experiments:

http://www.takipiblog.com/2014/01/23/java-scala-guava-and-trove-collections-how-much-can-they-hold/

+3
source

Take a look at High Performance Primitive Collections for Java (HPPC) . It is an alternative to honey, maturity and carefully designed to increase effectiveness. See JavaDoc for IntOpenHashSet .

+1
source

All Articles