Context
Our application stores a lot of data in memory in many different types of cards to provide a quick search. To make it simple (and not considering primitive maps), it is always a map with one or more keys. Performance is a big requirement for us.
Problem
I wanted to find the most efficient map implementation and, as suggested here , I compared these implementations:
Card maps (nested maps) based on java.util.HashMap specifically for 3 keys:
Map<K1, Map<K2, Map<K3, V>>>
Wrapper key (tuples as keys) in java.util.HashMap
Map<Triple<K1, K2, K3>, V>
Tuples as keys in the file net.openhft.koloboke.collect.map.hash.HashObjObjMap, which according to this should be (one of) the fastest display.
HashObjObjMap<Triple<K1, K2, K3>, V>
Expectations
- Nested cards will have the fastest GET and the slowest PUT.
- Koloboke hash map will be faster than jdk HashMap.
results
Benchmark Mode Cnt Score Error Units TupleVsNestedMapsBenchmark.benchGetFromNestedMap avgt 20 11.586 ± 0.205 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleKolobokeMap avgt 20 18.619 ± 0.113 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleMap avgt 20 8.985 ± 0.085 ns/op TupleVsNestedMapsBenchmark.benchPutToNestedMap avgt 20 15.106 ± 0.142 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleKolobokeMap avgt 20 22.533 ± 0.335 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleMap avgt 20 8.884 ± 0.084 ns/op
Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(100000) @Fork(1) @Warmup(iterations = 10) @Measurement(iterations = 20) public class TupleVsNestedMapsBenchmark { public static final int N = 10000; static ObjObjObjObjHashMap<String, String, String, Integer> sourceNestedMap = new ObjObjObjObjHashMap<>(); static Map<Triple<String, String, String>, Integer> sourceTupleMap = new HashMap<>(); static HashObjObjMap<Triple<String, String, String>, Integer> sourceTupleKMap = HashObjObjMaps.newMutableMap(); static { for (int i = 0; i < N; i++) { sourceNestedMap.put("a-" + i, "b-" + i, "c-" + i, i); sourceTupleMap.put(ImmutableTriple.of("a-" + i, "b-" + i, "c-" + i), i); sourceTupleKMap.put(ImmutableTriple.of("a-" + i, "b-" + i, "c-" + i), i); } } @Benchmark public List<Integer> benchGetFromNestedMap() { return benchmarkGet(sourceNestedMap::get); } @Benchmark public List<Integer> benchGetFromTupleMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTupleKolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleKMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public ObjObjObjObjHashMap<String, String, String, Integer> benchPutToNestedMap() { ObjObjObjObjHashMap<String, String, String, Integer> map = new ObjObjObjObjHashMap<>(); benchmarkPut(map::put); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleMap() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleKolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } private List<Integer> benchmarkGet(MapValueSupplier<Integer> mapValueSupplier) { List<Integer> result = new ArrayList<>(N); for (int i = 0; i < N; i++) { result.add(mapValueSupplier.supply("a-" + i, "b-" + i, "c-" + i)); } return result; } private void benchmarkPut(PutValueFunction<Integer> putValueFunction) { for (int i = 0; i < N; i++) { putValueFunction.apply("a-" + i, "b-" + i, "c-" + i, i); } } private interface MapValueSupplier<T> { T supply(String key1, String key2, String key3); } private interface PutValueFunction<T> { void apply(String key1, String key2, String key3, T value); } }
Note: please do not suggest using primitive maps. An integer like (value) is just an example of a cheap object.
Questions
- Why is the kolobok map 2.5 times slower than the jdk card?
- Why aren't nested cards faster? (I expect the distribution overhead for the key tuple to be larger.)
- Or is my test incorrect? Then, how can I improve this?
Update
Based on good tips from @leventov, I changed Benchmark and also tried the Triple implementation, which caches the hash code (and has a better distribution) - tests are called Tuple2.
@State(Scope.Thread) @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(TupleVsNestedMapsBenchmark.TOTAL_OPS) @Fork(1) @Warmup(iterations = 5) @Measurement(iterations = 20) public class TupleVsNestedMapsBenchmark { static final int N = 30; static final int TOTAL_OPS = N * N * N; private ObjObjObjObjHashMap<String, String, String, Integer> sourceNestedMap; private Map<Triple<String, String, String>, Integer> sourceTupleMap; private HashObjObjMap<Triple<String, String, String>, Integer> sourceTupleKMap; private Map<Triple<String, String, String>, Integer> sourceTuple2Map; private HashObjObjMap<Triple<String, String, String>, Integer> sourceTuple2KMap; private String[] keys; @Setup public void init() { sourceNestedMap = new ObjObjObjObjHashMap<>(); sourceTupleMap = new HashMap<>(TOTAL_OPS); sourceTupleKMap = HashObjObjMaps.newMutableMap(TOTAL_OPS); sourceTuple2Map = new HashMap<>(TOTAL_OPS); sourceTuple2KMap = HashObjObjMaps.newMutableMap(TOTAL_OPS); keys = new String[N]; for (int i = 0; i < N; i++) { keys[i] = "k" + i; } for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { sourceNestedMap.put(keys[i], keys[j], keys[k], i); sourceTupleMap.put(ImmutableTriple.of(keys[i], keys[j], keys[k]), i); sourceTupleKMap.put(ImmutableTriple.of(keys[i], keys[j], keys[k]), i); sourceTuple2Map.put(ImmutableTriple2.of(keys[i], keys[j], keys[k]), i); sourceTuple2KMap.put(ImmutableTriple2.of(keys[i], keys[j], keys[k]), i); } } } } @Benchmark public List<Integer> benchGetFromNestedMap() { return benchmarkGet(sourceNestedMap::get); } @Benchmark public List<Integer> benchGetFromTupleMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTupleKolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleKMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTuple2Map() { return benchmarkGet(((key1, key2, key3) -> sourceTuple2Map.get(ImmutableTriple2.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTuple2KolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTuple2KMap.get(ImmutableTriple2.of(key1, key2, key3)))); } @Benchmark public ObjObjObjObjHashMap<String, String, String, Integer> benchPutToNestedMap() { ObjObjObjObjHashMap<String, String, String, Integer> map = new ObjObjObjObjHashMap<>(); benchmarkPut(map::put); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleMap() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleKolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTuple2Map() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple2.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTuple2KolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple2.of(key1, key2, key3), value)); return map; } private List<Integer> benchmarkGet(MapValueSupplier<Integer> mapValueSupplier) { List<Integer> result = new ArrayList<>(TOTAL_OPS); for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { Integer value = mapValueSupplier.supply(keys[i], keys[j], keys[k]); result.add(value); } } } return result; } private void benchmarkPut(PutValueFunction<Integer> putValueFunction) { Integer value = 1; for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { putValueFunction.apply(keys[i], keys[j], keys[k], value); } } } } private interface MapValueSupplier<T> { T supply(String key1, String key2, String key3); } private interface PutValueFunction<T> { void apply(String key1, String key2, String key3, T value); } }
and the results are as follows:
Benchmark Mode Cnt Score Error Units TupleVsNestedMapsBenchmark.benchGetFromNestedMap avgt 20 24.524 ± 0.144 ns/op TupleVsNestedMapsBenchmark.benchGetFromTuple2KolobokeMap avgt 20 65.604 ± 1.135 ns/op TupleVsNestedMapsBenchmark.benchGetFromTuple2Map avgt 20 22.653 ± 0.745 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleKolobokeMap avgt 20 34824.901 ± 1718.183 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleMap avgt 20 2565.835 ± 57.402 ns/op TupleVsNestedMapsBenchmark.benchPutToNestedMap avgt 20 43.160 ± 0.340 ns/op TupleVsNestedMapsBenchmark.benchPutToTuple2KolobokeMap avgt 20 237.300 ± 3.362 ns/op TupleVsNestedMapsBenchmark.benchPutToTuple2Map avgt 20 40.952 ± 0.535 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleKolobokeMap avgt 20 52315.769 ± 399.769 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleMap avgt 20 3205.538 ± 44.306 ns/op
Summary
- The tuple approach can be very slow if the key class hash function is not cached and / or well distributed, especially for koloboke.
- And as also concluded here (in this case (Obj-Obj)), java.util.HashMap is "extremely" fast.