300 million items on the map

  • If each of them is guaranteed to have a unique key (generated and forced external key management system), whose correct card implementation is right for me? Suppose this should be optimized for simultaneous searches (data is initialized once during application launch).
  • Does this 300 million unique keys have any positive or negative effects on bucketing / collisions?
  • Any other suggestions?

My map will look something like this.

Map<String, <boolean, boolean, boolean, boolean>> 
+6
source share
4 answers

I would not use a card, it needs a lot of memory. Especially in your case. Store the values ​​in one data array and store the keys in a sorted array of indices. In a sorted array, you use binSearch to find the key position in the [] data.

The hard part will be to create an array without running out of memory.

you do not need to consider concurreny because you are only reading data

In the future, try to avoid using a string key. try converting them to long ones.
advantage of this solution: the search time does not exceed log n. even in worst cases when keys create hashcode problems

+3
source

Another suggestion? You are betting.

Using the proper keystore, Redis is the first option that comes to mind. Sure it is a separate process and dependency, but you will gain a lot of time when it comes to the correct design of the system.

There must be a very good reason why you would like to associate your business logic with several gigabytes of data in the same process memory, even if it is ephemeral. I tried this several times and was always wrong.

+3
source

It seems to me that you can just use TreeMap because it will give you O(log(n)) to search for data due to its ordered structure. This is also an acceptable method, since, as you said, all data will be loaded at startup.

0
source

If you need to keep everything in memory, you will need to use some library designed for use with as many elements as Huge collections . In addition, if the number of entries is large, you also need to think about more complex solutions, such as a non-blocking hash map

0
source

All Articles