Multi Key Maps - Performance Comparison

Question

Multi Key Maps - Performance Comparison

Context

Our application stores a lot of data in memory in many different types of cards to provide a quick search. To make it simple (and not considering primitive maps), it is always a map with one or more keys. Performance is a big requirement for us.

Problem

I wanted to find the most efficient map implementation and, as suggested here , I compared these implementations:

Card maps (nested maps) based on java.util.HashMap specifically for 3 keys:
```
Map<K1, Map<K2, Map<K3, V>>> 
```
Wrapper key (tuples as keys) in java.util.HashMap
```
 Map<Triple<K1, K2, K3>, V> 
```
Tuples as keys in the file net.openhft.koloboke.collect.map.hash.HashObjObjMap, which according to this should be (one of) the fastest display.
```
 HashObjObjMap<Triple<K1, K2, K3>, V> 
```

Expectations

Nested cards will have the fastest GET and the slowest PUT.
Koloboke hash map will be faster than jdk HashMap.

results

 Benchmark Mode Cnt Score Error Units TupleVsNestedMapsBenchmark.benchGetFromNestedMap avgt 20 11.586 ± 0.205 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleKolobokeMap avgt 20 18.619 ± 0.113 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleMap avgt 20 8.985 ± 0.085 ns/op TupleVsNestedMapsBenchmark.benchPutToNestedMap avgt 20 15.106 ± 0.142 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleKolobokeMap avgt 20 22.533 ± 0.335 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleMap avgt 20 8.884 ± 0.084 ns/op

Benchmark

 @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(100000) @Fork(1) @Warmup(iterations = 10) @Measurement(iterations = 20) public class TupleVsNestedMapsBenchmark { public static final int N = 10000; static ObjObjObjObjHashMap<String, String, String, Integer> sourceNestedMap = new ObjObjObjObjHashMap<>(); static Map<Triple<String, String, String>, Integer> sourceTupleMap = new HashMap<>(); static HashObjObjMap<Triple<String, String, String>, Integer> sourceTupleKMap = HashObjObjMaps.newMutableMap(); static { for (int i = 0; i < N; i++) { sourceNestedMap.put("a-" + i, "b-" + i, "c-" + i, i); sourceTupleMap.put(ImmutableTriple.of("a-" + i, "b-" + i, "c-" + i), i); sourceTupleKMap.put(ImmutableTriple.of("a-" + i, "b-" + i, "c-" + i), i); } } @Benchmark public List<Integer> benchGetFromNestedMap() { return benchmarkGet(sourceNestedMap::get); } @Benchmark public List<Integer> benchGetFromTupleMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTupleKolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleKMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public ObjObjObjObjHashMap<String, String, String, Integer> benchPutToNestedMap() { ObjObjObjObjHashMap<String, String, String, Integer> map = new ObjObjObjObjHashMap<>(); benchmarkPut(map::put); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleMap() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleKolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } private List<Integer> benchmarkGet(MapValueSupplier<Integer> mapValueSupplier) { List<Integer> result = new ArrayList<>(N); for (int i = 0; i < N; i++) { result.add(mapValueSupplier.supply("a-" + i, "b-" + i, "c-" + i)); } return result; } private void benchmarkPut(PutValueFunction<Integer> putValueFunction) { for (int i = 0; i < N; i++) { putValueFunction.apply("a-" + i, "b-" + i, "c-" + i, i); } } private interface MapValueSupplier<T> { T supply(String key1, String key2, String key3); } private interface PutValueFunction<T> { void apply(String key1, String key2, String key3, T value); } }

Note: please do not suggest using primitive maps. An integer like (value) is just an example of a cheap object.

Questions

Why is the kolobok map 2.5 times slower than the jdk card?
Why aren't nested cards faster? (I expect the distribution overhead for the key tuple to be larger.)
Or is my test incorrect? Then, how can I improve this?

Update

Based on good tips from @leventov, I changed Benchmark and also tried the Triple implementation, which caches the hash code (and has a better distribution) - tests are called Tuple2.

 @State(Scope.Thread) @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(TupleVsNestedMapsBenchmark.TOTAL_OPS) @Fork(1) @Warmup(iterations = 5) @Measurement(iterations = 20) public class TupleVsNestedMapsBenchmark { static final int N = 30; static final int TOTAL_OPS = N * N * N; private ObjObjObjObjHashMap<String, String, String, Integer> sourceNestedMap; private Map<Triple<String, String, String>, Integer> sourceTupleMap; private HashObjObjMap<Triple<String, String, String>, Integer> sourceTupleKMap; private Map<Triple<String, String, String>, Integer> sourceTuple2Map; private HashObjObjMap<Triple<String, String, String>, Integer> sourceTuple2KMap; private String[] keys; @Setup public void init() { sourceNestedMap = new ObjObjObjObjHashMap<>(); sourceTupleMap = new HashMap<>(TOTAL_OPS); sourceTupleKMap = HashObjObjMaps.newMutableMap(TOTAL_OPS); sourceTuple2Map = new HashMap<>(TOTAL_OPS); sourceTuple2KMap = HashObjObjMaps.newMutableMap(TOTAL_OPS); keys = new String[N]; for (int i = 0; i < N; i++) { keys[i] = "k" + i; } for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { sourceNestedMap.put(keys[i], keys[j], keys[k], i); sourceTupleMap.put(ImmutableTriple.of(keys[i], keys[j], keys[k]), i); sourceTupleKMap.put(ImmutableTriple.of(keys[i], keys[j], keys[k]), i); sourceTuple2Map.put(ImmutableTriple2.of(keys[i], keys[j], keys[k]), i); sourceTuple2KMap.put(ImmutableTriple2.of(keys[i], keys[j], keys[k]), i); } } } } @Benchmark public List<Integer> benchGetFromNestedMap() { return benchmarkGet(sourceNestedMap::get); } @Benchmark public List<Integer> benchGetFromTupleMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTupleKolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTupleKMap.get(ImmutableTriple.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTuple2Map() { return benchmarkGet(((key1, key2, key3) -> sourceTuple2Map.get(ImmutableTriple2.of(key1, key2, key3)))); } @Benchmark public List<Integer> benchGetFromTuple2KolobokeMap() { return benchmarkGet(((key1, key2, key3) -> sourceTuple2KMap.get(ImmutableTriple2.of(key1, key2, key3)))); } @Benchmark public ObjObjObjObjHashMap<String, String, String, Integer> benchPutToNestedMap() { ObjObjObjObjHashMap<String, String, String, Integer> map = new ObjObjObjObjHashMap<>(); benchmarkPut(map::put); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleMap() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTupleKolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTuple2Map() { Map<Triple<String, String, String>, Integer> map = new HashMap<>(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple2.of(key1, key2, key3), value)); return map; } @Benchmark public Map<Triple<String, String, String>, Integer> benchPutToTuple2KolobokeMap() { HashObjObjMap<Triple<String, String, String>, Integer> map = HashObjObjMaps.newMutableMap(); benchmarkPut((key1, key2, key3, value) -> map.put(ImmutableTriple2.of(key1, key2, key3), value)); return map; } private List<Integer> benchmarkGet(MapValueSupplier<Integer> mapValueSupplier) { List<Integer> result = new ArrayList<>(TOTAL_OPS); for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { Integer value = mapValueSupplier.supply(keys[i], keys[j], keys[k]); result.add(value); } } } return result; } private void benchmarkPut(PutValueFunction<Integer> putValueFunction) { Integer value = 1; for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { for (int k = 0; k < N; k++) { putValueFunction.apply(keys[i], keys[j], keys[k], value); } } } } private interface MapValueSupplier<T> { T supply(String key1, String key2, String key3); } private interface PutValueFunction<T> { void apply(String key1, String key2, String key3, T value); } }

and the results are as follows:

 Benchmark Mode Cnt Score Error Units TupleVsNestedMapsBenchmark.benchGetFromNestedMap avgt 20 24.524 ± 0.144 ns/op TupleVsNestedMapsBenchmark.benchGetFromTuple2KolobokeMap avgt 20 65.604 ± 1.135 ns/op TupleVsNestedMapsBenchmark.benchGetFromTuple2Map avgt 20 22.653 ± 0.745 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleKolobokeMap avgt 20 34824.901 ± 1718.183 ns/op TupleVsNestedMapsBenchmark.benchGetFromTupleMap avgt 20 2565.835 ± 57.402 ns/op TupleVsNestedMapsBenchmark.benchPutToNestedMap avgt 20 43.160 ± 0.340 ns/op TupleVsNestedMapsBenchmark.benchPutToTuple2KolobokeMap avgt 20 237.300 ± 3.362 ns/op TupleVsNestedMapsBenchmark.benchPutToTuple2Map avgt 20 40.952 ± 0.535 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleKolobokeMap avgt 20 52315.769 ± 399.769 ns/op TupleVsNestedMapsBenchmark.benchPutToTupleMap avgt 20 3205.538 ± 44.306 ns/op

Summary

The tuple approach can be very slow if the key class hash function is not cached and / or well distributed, especially for koloboke.
And as also concluded here (in this case (Obj-Obj)), java.util.HashMap is "extremely" fast.

+5

java performance hashmap koloboke

Milos gregor Aug 11 '15 at 12:27

source share

3 answers

List of problems with your metrics:

initialization is performed in a static area, it must be done using the @Setup and @State s methods
Heavy distribution within the test and line building! What are you actually measuring?
Note the error - N - 10K, but operationsPerInvocation - 100K, so the actual times are pretty depressing
Invalid String + hash code. Very bad Triple hash code, leads to some clustering in the hash tables
When testing a nested vs tuple, note that you have selected all parts of all keys to be unique, i.e. e. All nested cards are cards with one key. This is not what you wanted.

+1

leventov Aug 11 '15 at 19:11

source share

A triple as an abstraction is fine (at least I don't see a clearly better alternative, you can override the Apache Commons Triple abstract class to determine the best hashCode() function.

 final class ImmutableTriple<L, M, R> extends Triple<L, M, R> { public final L left; public final M middle; public final R right; private int h; public static <L, M, R> ImmutableTriple<L, M, R> of(L left, M middle, R right) { return new ImmutableTriple(left, middle, right); } public ImmutableTriple(L left, M middle, R right) { this.left = left; this.middle = middle; this.right = right; } public L getLeft() { return this.left; } public M getMiddle() { return this.middle; } public R getRight() { return this.right; } private int innerHash() { int h = left.hashCode(); h *= 1000003; h += middle.hashCode(); h *= 1000003; h += right.hashCode(); return (int) LongHashFunction.murmur_3().hashInt(h); } @Override public int hashCode() { return h != 0 ? h : (h = innerHash()); } }

0

leventov Aug 12 '15 at 14:57

source share

leventov · Accepted Answer · 2015-08-13T17:26:58+0000

[Answer the updated question.]

Well, there are still problems with the standards:

When creating the State life cycle, you must pass the state object to the benhcmark method as a parameter (see my code below).
Benchmarking put() should be done differently: 1) in the @Setup method, the collection must be created (with a sufficient capacity or size argument), 2) in another @Setup(Level.Invocation) method @Setup(Level.Invocation) , you should call collection.clear() 3) measure pure put() in the comparison method
You still highlight a lot in the benchmarking method. This may be your case, but it hides the collection performance contribution.

So what I wrote:

 package tests; import net.openhft.koloboke.collect.map.hash.HashObjObjMap; import net.openhft.koloboke.collect.map.hash.HashObjObjMaps; import org.apache.commons.lang3.tuple.Triple; import org.openjdk.jmh.annotations.*; import java.util.HashMap; import java.util.Map; import java.util.concurrent.TimeUnit; @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @Fork(1) @Threads(1) @Warmup(iterations = 10) @Measurement(iterations = 20) @State(Scope.Thread) public class SoMultiMap { public static final int N = Integer.getInteger("runs", 100000); private static final double kbk = Double.parseDouble(System.getProperty("kbk", "1.0")); static class ImmutableTriple<L, M, R> extends Triple<L, M, R> { public final L left; public final M middle; public final R right; private int h; public static <L, M, R> ImmutableTriple<L, M, R> of(L left, M middle, R right) { return new ImmutableTriple(left, middle, right); } public ImmutableTriple(L left, M middle, R right) { this.left = left; this.middle = middle; this.right = right; } public L getLeft() { return this.left; } public M getMiddle() { return this.middle; } public R getRight() { return this.right; } private int innerHash() { int h = left.hashCode(); h *= 1000003; h += middle.hashCode(); h *= 1000003; h += right.hashCode(); return h * 1000003; } @Override public int hashCode() { return h != 0 ? h : (h = innerHash()); } @Override public boolean equals(Object obj) { if (!(obj instanceof ImmutableTriple)) return super.equals(obj); ImmutableTriple triple = (ImmutableTriple) obj; if (h != 0 && triple.h != 0 && h != triple.h) return false; return super.equals(obj); } } ImmutableTriple<String, String, String>[] keys = new ImmutableTriple[N]; Integer[] values = new Integer[N]; Map<Triple<String, String, String>, Integer> sourceTupleMap; HashObjObjMap<Triple<String, String, String>, Integer> sourceTupleKMap; @Setup public void fill() { sourceTupleMap = new HashMap<>((int) (N / 0.75)); sourceTupleKMap = HashObjObjMaps.newUpdatableMap((int) (N * kbk)); for (int i = 0; i < N; i++) { keys[i] = ImmutableTriple.of("a-" + i, "b-" + i, "c-" + i); values[i] = i; sourceTupleKMap.put(keys[i], values[i]); sourceTupleMap.put(keys[i], values[i]); } } @Benchmark public int tupleHashMapGet(SoMultiMap st) { ImmutableTriple<String, String, String>[] keys = st.keys; Map<Triple<String, String, String>, Integer> map = st.sourceTupleMap; int s = 0; for (int i = 0; i < N; i++) { s += map.get(keys[i]); } return s; } @Benchmark public int tupleKolobokeGet(SoMultiMap st) { ImmutableTriple<String, String, String>[] keys = st.keys; HashObjObjMap<Triple<String, String, String>, Integer> map = st.sourceTupleKMap; int s = 0; for (int i = 0; i < N; i++) { s += map.get(keys[i]); } return s; } public static void main(String[] args) { SoMultiMap st = new SoMultiMap(); st.fill(); st.tupleKolobokeGet(st); st.tupleHashMapGet(st); } }

Now, interestingly, there are results:

With Java 7u55:

 HashMap: 65 +- 6 ns/op Koloboke: 46 +- 2

With Java 8u51:

 HashMap: 42 +- 0.5 Koloboke: 49 +- 1

So, we have some change in VM, something in between, which made HashMap much faster, and Koloboke cards a little slower. This requires an investigation, for which I do not have time right now. See https://github.com/OpenHFT/Koloboke/issues/42

Also pay attention to a couple of things:

Running tests on the VM server
Disable CPU scaling at runtime
close heavy applications (browser, Intellij, etc.) if you do not have 16+ hardware threads

Multi Key Maps - Performance Comparison

Context

Problem

Expectations

results

Benchmark

Questions

Update

Summary

More articles: