Java: HashSet vs HashMap

Question

Java: HashSet vs HashMap

I have a program working on huge data sets. Objects are best stored in hash containers as the program continues to search for objects in the container.

The first idea was to use HashMap, since the get and remove methods of this container are more suitable for the use I need.

But I came to the conclusion that using HashMap is quite a lot of memory, which is a serious problem, so I thought that switching to a HashSet would be better, because it uses only <E> and not <K,V> for each element, but when I looked at the implementation, I found out that it uses the basic HashMap! this means that it will not save memory!

So these are my questions:

Are all my assumptions true?
Is HashMap memory wasteful? more specifically, what is its overhead for each record?
Is a HashSet as wasteful as a HashMap?
Are there any other Hash-based containers that will be significantly less memory consumables?
Update

As indicated in the comments, I will expand my program a bit, hashMap is designed to store a couple of other objects, as well as some numerical value calculated on the boards. along the way, he extracts some of them and introduces new pairs. For a couple, it is necessary that she does not hold this pair or remove it. Mapping can be done using a float or hashCode value for a paired object.

Also, when I say "huge datasets", I am talking about objects ~ 4 * 10 ^ 9

+7

java memory-management hashmap hashset hash

Ravid goldenberg Feb 01 '15 at 9:51

source share

3 answers

jfun · Answer 1 · 2015-02-01T10:14:48+0000

This site has some helpful tips on collection performance in java.

HashSet built on top of HashMap< T, Object > , where the value is singleton '. This means that the memory consumption of aHashSet is identical to HashMap : to store SIZE values you need 32 * SIZE + 4 * CAPACITY bytes (plus the size of your values). This is definitely not a collective memory.
THashSet might be the easiest replacement collection for a HashSet — it implements Set and Iterable, which means you just need to update the single letter in the initialization of your set.
THashSet uses one array of objects for its values, so it uses 4 * CAPACITY bytes for storage. As you can see, compared to the JDK HashSet, you will save 32 * SIZE bytes in case of the same load factor, which is a huge improvement.

Also below the image that I took from here can help us keep something in mind for choosing the right collection

gknicker · Answer 2 · 2015-02-01T10:10:38+0000

Are all my assumptions true?

You are correct that the HashSet implemented using the HashMap , so you will not save memory using the HashSet instead.

If you are creating maps with a large number of elements, you should build your HashMap with initialCapacity for the best of your knowledge, to prevent repeated repetition (thus, memory interruption).

Is HashMap memory wasteful? more specifically, what is its overhead for each record?

No, this is not wasteful. Overhead is the underlying array (size modified by loadFactor ) and the Entry object for each key-value pair. In addition to storing the key and value, the record object also stores a pointer to the next record in the slot (in case two or more records occupy the same slot in the base array). By default, loadFactor 0.75 saves the size of the underlying array by 133% of the number of records.

In particular, the memory overhead for each record:

link of the object of the object to the key,
a reference to a value reference object,
a reference to the record object to the next record,
and a reference to the underlying array for the record (divided by the load factor).

It is very difficult to get much more trim than for a collection based on hashes.

Is a HashSet as wasteful as a HashMap?

You will not get any memory efficiency using a HashSet instead of a HashMap .

Are there any other hash-based containers that will be significantly less memory consumables?

If your keys are primitives (e.g. int s), there are custom Map and Set implementations (in third-party libraries ) that use data structures that are more efficient for storing data.

Joseph K. Strauss · Answer 3 · 2015-02-01T10:17:25+0000

It is true that a HashSet uses as much memory as a HashMap. The difference between the two is that HasSet implements Set, i.e. It does not care about any value associated with the key, but only about the presence or absence of its specific value. HashMap is engaged in storage / extraction (put / get) of values on a key.

While a HashMap / HashSet stores data in an array, which is usually slightly larger than the number of elements, this is not a very big problem because the load factor is .75. This means that the HashMap will grow when the number of elements reaches 75% of the size of the underlying array.

A bigger problem than a big map will be a lot of empty maps, since the default size for HashMap is 16. This can be compensated by setting the initial capacity to 0.

You can also use TreeMap instead, since TreeMap is based on links instead of an array, you are likely to spend even more space, especially with large maps, in addition, losing some speed. The main advantage of TreeMap is that it keeps the keys in an ordered state, so if you need to sort them, this is the way to go.

In addition, TreeMap can be used for programming if you cannot or do not want to create a custom implementation of the equals and hashCode methods of your key type. Instead, you can make a comparator for the key type. For example, to make a map / install based on a string, case insensitive, use String.CASE_INSENSITIVE_ORDER as a TreeSet comparator

Java: HashSet vs HashMap

More articles: