If the actual data really resembles pizza, toppings and crusts, i.e. e. there are only a few separate toppings / crusts, and thousands of pizzas contain each of them, I would say that having the proper multimap overflows for this case, and you'd better have pepperoni_pizzas.dat , onions_pizzas.dat , ... distinct added general lists With UUIDs, you can use Chronicle Queue to access and update them from several processes.
If 10s-100s of thousands of toppings / crusts, only 10s-100s of pizza on average are average, you should use a multimap. Really,
Essentially, there are 3 types of โproblemsโ with Chronicle-Maps-as-multimaps:
Excessive garbage collection for each request
If you create a chronicle map with a List<UUID> or Set<UUID> value type without specifying custom serializers for the value, it will work, but it will be extremely inefficient, because by default it will be built into Java serialization to serialize and deserialize the entire collection of values โโby To each request without reusing not collection heap objects, nor individual UUID of heap objects for elements. Therefore, with every request to the ChronicleMap a lot of garbage will be generated.
Solution However, if you set the serializer value as ListMarshaller or SetMarshaller (or your own collection marshaller, which you could write based on the ListMarshaller and SetMarshaller ) combined with the reusable UUID heap object, it will solve this garbage problem:
ListMarshaller<ReusableUuid> valueMarshaller = ListMarshaller.of( ReusableUuidReader.INSTANCE, ReusableUuidWriter.INSTANCE); List<ReusableUuid> averageValue = Stream .generate(() -> ReusableUuid.random()) .limit(averagePizzasForTopping) .collect(Collectors.toList()); ChronicleMap<Topping, List<ReusableUuid>> map = ChronicleMap .of(Topping.class, (Class<List<ReusableUuid>>) (Class) List.class) .averageKey(pepperoni) .valueMarshaller(valueMarshaller) .averageValue(averageValue) .entries(numberOfToppings) .createPersistedTo(new File("toppings_to_pizza_ids.dat"));
Ineffective updates and value replication
When you add another pizza UUID to a list of 100 UUIDs and paste the new value back into the Chronicle Map, the Chronicle Map rewrites the whole list again, rather than adding one UUID to the end of the heap memory piece. And if you use replication, it will send the whole list of 100 UUIDs as updated value to other nodes instead of sending only one added UUID.
Both (value updating and replication) can be optimized using horrible hacks, but this requires a very deep knowledge of the implementation of the chronicle map and will be very fragile.
Chronicle map memory fragmentation
If you plan to add new pizzas during the storage period of the data warehouse, the memory areas originally allocated for input will become too small to store new values โโwith a large number of UUIDs, so the memory areas will be redistributed (perhaps several times for each UUID list). Structure The data structure of a chronic map implies a simplified memory allocation scheme that suffers poorly from fragmentation if records are reallocated many times.
If you have many UUIDs in lists and you run your application on Linux, you can mitigate this problem by pre-allocating a lot of memory (more than will be practically necessary for any list) for each record (by specifying the .actualChunkSize() configuration in ChronicleMapBuilder ) and relying on the Linux function for lazy memory matching (as needed). Thus, you will lose no more than 4 Kbytes of memory for each UUID list, which may be OK if the lists contain a lot of KB size.
On the other hand, if your lists are so long (and these are UUID lists, i.e. small structures), and you have only 100,000 pizzas, you do not need a multimap in the first place, see the beginning of this answer.
A trick with re-arranging the memory and using lazy memory allocation on Linux will also work for a short list (of collections) of values, but only if the elements themselves are large, so the average total size of the value is a lot of KB.
Fragmentation is also less problematic when you can avoid reallocating memory in any other way, i.e. E. New pizza UUIDs are added on time, but also deleted, so the sizes of top-to-uuids lists float around some average, and reallocation is rarely performed.
Memory fragmentation is never a problem if the values โโare never updated (or never resized) after the record is inserted into the chronicle.
Conclusion
In some use cases and with proper configuration, the Chronicle Map can serve as a multi-well well. In other cases, the chronicle map as a multimap is inherently ineffective.
Factors that matter:
- The total number of keys โ
List<Value> entries in the multiplayer - Total number of values
- Average and key size distribution
- Average value and distribution of individual size values
- The average value and size distribution of the list of values
- The value displays the dynamics over the lifetime of the chronicle (never updated, only added, deleted and added. Deletes from the beginning and middle of lists more expensive.)
- If replication map chronicle or not