What load factor should be used when you know the maximum possible absence of elements in a HashSet

What load factor should I use when I really know the maximum possible absence of elements in a HashSet? I heard that a default load factor of 0.75 is recommended, as it offers good compromises between speed and space. It's right? However, the larger size of the HashSet also takes longer to create and space.

I use a HashSet just inorder to remove duplicate integers from a list of integers.

+7
source share
4 answers

I spent some time playing with load factors once, and it is shocking how few differences that actually make in practice. Even if you set it to something high, for example, 2.0, this will not slow down the work, and will not save so much memory. Just pretend it doesn't exist. Josh often regretted having ever revealed it as an option.

+5
source

For your stated problem, instead of using a HashSet, you might also consider BitSet

Depending on the range and sparseness of your integers, you may get higher performance and spatial characteristics.

+2
source

It depends on your integers. The point of the load factor is the "balancing" of the hash function: with the "perfect" hash function, your load factor can be 1.0. However, if the integer values ​​in question show any pattern, this can lead to more than average hash collisions that reduce the efficiency of the map. Then a lower load factor can help to significantly increase values ​​(over a wider range), thus reducing hash collisions.

I would not worry about creation time and additional space, using a lower load factor. I doubt that you would notice the difference (if you are not on a platform with limited hardware or do not have several million integers in your card - then the size difference can become noticeable, approximately in the range of several additional megabytes per million values).

+1
source

If you know EXACTLY how much you should have, you should set the load factor to 1 and ensure that the hash maps display 1: 1. You might want to expand your container so as not to rephrase your hash.

Note that such an “exact” thing changes over time, so you might be better off with your regular container. :)

EDIT: My answer was before I realized that these are integers.

Yes, your best bet is to simply leave as you are. You will never notice the difference.

/** * Remove duplicates from a list. * @note This will ALTER the list. * @note This is not thread safe. * @param the list (potentially with duplicates) */ void removeDuplicates(List<Integer> list) { Set<Integer> noDupe = new HashSet<Integer>(list.size()); // will end up resizing once, oh well for(Integer i : list) noDupe.add(i); list.clear(); list.addAll(noDupe); } 
0
source

All Articles