Multiple Space Problem: Guava

In my Java code, I am using Guava Multimap ( com.google.common.collect.Multimap ) using this:

Multimap<Integer, Integer> Index = HashMultimap.create() 

Here, the Multimap key is part of the URL, and the value is the other part of the URL (converted to an integer). Now I assign a bunch of 2560 MB JVM (2.5 GB) (using Xmx and Xms). However, it can only store 9 million such (integer, significant) pairs of integers (about 10 million). But, theoretically (according to the memory occupied by int ), it should store more.

Can someone help me

  • Why Multimap use a lot of memory? I checked my code and without inserting pairs into Multimap , it uses only 1/2 MB of memory. 2.

Is there any other way or solution at home to solve this memory problem? . Is there a way to reduce this overhead since I only want to store int-int? In any other language? Or any other solution (preferred for home use) to solve the problem that I am facing, means creating a database or something like this solution.

+7
source share
4 answers

There are a huge number of service messages related to Multimap . Least:

  • Each key and value is an Integer object that (at least) doubles the storage requirements of each int value.
  • Each unique key value in a HashMultimap is associated with a Collection values ​​(according to the source , Collection is a Hashset ).
  • Each Hashset is created with a default space of 8 values.

Thus, each key / value pair requires (at least) an order of magnitude more space than you might expect for two int values. (Somewhat less when several values ​​are stored under the same key.) I would expect 10 million key / value pairs to take, possibly 400 MB.

Although you have a 2.5 GB heap, I would not be surprised if that were not enough. The above estimate, I think, is on the low side. In addition, it only takes into account how much is needed to save the map after its creation. As the map grows, the table needs to be redistributed and rephrased, which temporarily doubles the amount of space used. Finally, all of this assumes that int values ​​and object references require 4 bytes. If the JVM uses 64-bit addressing, the byte counter is probably doubled.

+9
source

Probably the easiest way to minimize memory overhead would be to potentially mix β€œtrophies” with primitive implementations of the collection (to avoid the overhead of memory in boxing) and Guava Multimap , something like

 SetMultimap<Integer, Integer> multimap = Multimaps.newSetMultimap( TDecorators.wrap(TIntObjectHashMap<Collection<Integer>>()), new Supplier<Set<Integer>>() { public Set<Integer> get() { return TDecorators.wrap(new TIntHashSet()); } }); 

It still has the overhead of boxing and unpacking on demand, but the memory it consumes just sitting there will be greatly reduced.

+4
source

It looks like you need a sparse Boolean matrix. Sparse matrices / arrays in Java should contain pointers to library code. Then, instead of putting (i, j) in the multimap, just put 1 in the matrix in [i] [j].

+1
source

You can probably use ArrayListMultimap, which requires less memory than HashMultimap, since ArrayLists are smaller than HashSets. Or you can change Louis Trove's solution by replacing Set with List to reduce memory usage.

Some applications rely on HashMultimap to satisfy the SetMultimap interface, but most of them do not work.

0
source

All Articles