Chronicle Multimedia

Question

Chronicle Multimedia

There is definitely a waiver of the ChronicleMap GitHub about Multimaps in the ChronicleMap:

Chronicle map is not ...
... There are no secondary indexes.
Multimap. Using ChronicleMap<K, Collection<V>> as a multimap is technically possible, but often leads to problems ...

Unfortunately, this is one of my use cases, and using a bunch of storage for this (with ChronicleMap) would certainly be the easiest way to do this.

Let me try to explain my problem with pizza. I have 100,000 different pizzas. Each pizza has an identity card and many different toppings and crusts. I have three access patterns:

Give me pizza by ID.
Give me all the pizzas that have a particular advantage.
Give me all the pizzas that have a certain crust.

I can easily store pizza using ChronicleMap<UUID,Pizza> . But this is only one access pattern. I do not want to go through every pizza to find those who have a match or crust. So I would like to store something like ChronicleMap<Topping,Collection<UUID>> and ChronicleMap<Crust,Collection<UUID>> .

Then, if someone asks me about all the pepperoni pizzas, I look up the top of the ChronicleMap to get the UUID from the corresponding pizza, and then on the main pizza map.

But the above documentation scares me. Does anyone know what these "problems" can often lead to? Why shouldn't I do this, although it works for me? Is this related to how ChronicleMap stores serialized objects, in particular a collection?

A few additional notes for potential questions:

We can add pizzas later, which will also require updating collections.
Many processes try to perform these operations, so you must use the card through the ChronicleMap, and not just the base ConcurrentMap.

+7

java chronicle chronicle-map

Depressio Apr 7 '16 at 17:45

source share

1 answer

leventov · Accepted Answer · 2016-04-07T20:27:28+0000

If the actual data really resembles pizza, toppings and crusts, i.e. e. there are only a few separate toppings / crusts, and thousands of pizzas contain each of them, I would say that having the proper multimap overflows for this case, and you'd better have pepperoni_pizzas.dat , onions_pizzas.dat , ... distinct added general lists With UUIDs, you can use Chronicle Queue to access and update them from several processes.

If 10s-100s of thousands of toppings / crusts, only 10s-100s of pizza on average are average, you should use a multimap. Really,

Essentially, there are 3 types of “problems” with Chronicle-Maps-as-multimaps:

Excessive garbage collection for each request

If you create a chronicle map with a List<UUID> or Set<UUID> value type without specifying custom serializers for the value, it will work, but it will be extremely inefficient, because by default it will be built into Java serialization to serialize and deserialize the entire collection of values by To each request without reusing not collection heap objects, nor individual UUID of heap objects for elements. Therefore, with every request to the ChronicleMap a lot of garbage will be generated.

Solution However, if you set the serializer value as ListMarshaller or SetMarshaller (or your own collection marshaller, which you could write based on the ListMarshaller and SetMarshaller ) combined with the reusable UUID heap object, it will solve this garbage problem:

 ListMarshaller<ReusableUuid> valueMarshaller = ListMarshaller.of( ReusableUuidReader.INSTANCE, ReusableUuidWriter.INSTANCE); List<ReusableUuid> averageValue = Stream .generate(() -> ReusableUuid.random()) .limit(averagePizzasForTopping) .collect(Collectors.toList()); ChronicleMap<Topping, List<ReusableUuid>> map = ChronicleMap .of(Topping.class, (Class<List<ReusableUuid>>) (Class) List.class) .averageKey(pepperoni) .valueMarshaller(valueMarshaller) .averageValue(averageValue) .entries(numberOfToppings) .createPersistedTo(new File("toppings_to_pizza_ids.dat"));

Ineffective updates and value replication

When you add another pizza UUID to a list of 100 UUIDs and paste the new value back into the Chronicle Map, the Chronicle Map rewrites the whole list again, rather than adding one UUID to the end of the heap memory piece. And if you use replication, it will send the whole list of 100 UUIDs as updated value to other nodes instead of sending only one added UUID.

Both (value updating and replication) can be optimized using horrible hacks, but this requires a very deep knowledge of the implementation of the chronicle map and will be very fragile.

Chronicle map memory fragmentation

If you plan to add new pizzas during the storage period of the data warehouse, the memory areas originally allocated for input will become too small to store new values with a large number of UUIDs, so the memory areas will be redistributed (perhaps several times for each UUID list). Structure The data structure of a chronic map implies a simplified memory allocation scheme that suffers poorly from fragmentation if records are reallocated many times.

If you have many UUIDs in lists and you run your application on Linux, you can mitigate this problem by pre-allocating a lot of memory (more than will be practically necessary for any list) for each record (by specifying the .actualChunkSize() configuration in ChronicleMapBuilder ) and relying on the Linux function for lazy memory matching (as needed). Thus, you will lose no more than 4 Kbytes of memory for each UUID list, which may be OK if the lists contain a lot of KB size.

On the other hand, if your lists are so long (and these are UUID lists, i.e. small structures), and you have only 100,000 pizzas, you do not need a multimap in the first place, see the beginning of this answer.

A trick with re-arranging the memory and using lazy memory allocation on Linux will also work for a short list (of collections) of values, but only if the elements themselves are large, so the average total size of the value is a lot of KB.

Fragmentation is also less problematic when you can avoid reallocating memory in any other way, i.e. E. New pizza UUIDs are added on time, but also deleted, so the sizes of top-to-uuids lists float around some average, and reallocation is rarely performed.

Memory fragmentation is never a problem if the values are never updated (or never resized) after the record is inserted into the chronicle.

Conclusion

In some use cases and with proper configuration, the Chronicle Map can serve as a multi-well well. In other cases, the chronicle map as a multimap is inherently ineffective.

Factors that matter:

The total number of keys → List<Value> entries in the multiplayer
Total number of values
Average and key size distribution
Average value and distribution of individual size values
The average value and size distribution of the list of values
The value displays the dynamics over the lifetime of the chronicle (never updated, only added, deleted and added. Deletes from the beginning and middle of lists more expensive.)
If replication map chronicle or not

Chronicle Multimedia

Excessive garbage collection for each request

Ineffective updates and value replication

Chronicle map memory fragmentation

Conclusion

More articles: