HashMap with 8 million entries is getting slow

I have a HashMap with a mapping of 8 million Point2D in LinkedList.

private Map<Point2D,LinkedList<RoadEdge>> adjacencyList; 

Everything works as it should, but for a very long time I need to get data from HashMap. Is there an alternative that I can use to optimize data retrieval?

I am ready to compromise the time spent on put() in favor of the time it takes for get() .

+7
java hashmap java-8
source share
3 answers

First of all, you need to check the hash code distribution. Check this out first, but with minor changes. The Key hash code inside the card is deleted again inside:

 static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } 

So, you should really intercept your hash code with this function, and then check how it spreads.

As a rule, with good distribution, your TreeNodes (this is how the bucket is set - uo for many records) is very quickly detected. For a bucket that has Integer.MAX_VALUE entries, no more than 32 steps are required to find it. This is because the garbage turns into a perfectly balanced red-black tree.

A search inside a Map usually O(1) . A basket search with TreeNodes is O(logn) . And the search inside LinkedList O(n) is much worse than the previous ones.

But this is the time it takes to find one entry on the map. If you need to get an element from LinkedList , which means extra time (worse than finding a record on a map)

Also, for a large number of records, it is important to first specify loadFactor and initialCapacity (at least initialCapacity) before placing items on the map. This is due to re-hashing and moving items for another bucket (potentially). If you first put them all and just try to find them without changing the Map , this will not be a problem when searching ...

But, as a rule, if you do not measure and do not measure correctly, this may not be the problem that you are facing. You may be looking in the wrong direction.

+8
source share

First of all, check the hash function of your key (Point2D). It must be well distributed or you will have a lot of collisions. See Eugene's Answer for an explanation of beauty, which is a HashMap.

Secondly, depending on how you want to access the data (I mean that you are trying to get the same object), you can use the cache, but do this only if you cannot change hashCode ,

Here is my implementation of the LRU cache from another project. This can speed up some things (this is not great code and does not cancel casting, but should work in most cases).

 package util; import java.util.HashMap; import java.util.LinkedHashMap; import java.util.Map; public class CachedHashMap<K, V> extends HashMap<K, V>{ private static final long serialVersionUID = 36215319740415714L; private LRUCache<K, V> cache; private LRUCache<K, Boolean> containsCache; public CachedHashMap(int cacheSize){ containsCache = new LRUCache<>(2*cacheSize); cache = new LRUCache<>(cacheSize); } @Override public V put(K key, V value){ cache.put(key, value); containsCache.put(key, true); return super.put(key, value); } @Override public boolean containsKey(Object key){ if(containsCache.containsKey(key)){ return containsCache.get(key); } boolean contains = super.containsKey(key); containsCache.put((K) key, contains); return contains; } @Override public V remove(Object key){ containsCache.remove(key); cache.remove(key); return super.remove(key); } @Override public V get(Object key){ if(containsCache.containsKey(key)){ if(!containsCache.get(key)) return null; } V value = cache.get(key); if(value != null) return value; value = super.get(key); K keyCast = (K) key; if(value != null){ containsCache.put(keyCast, true); cache.put(keyCast, value); }else{ containsCache.put(keyCast, false); } return value; } class LRUCache<A, B> extends LinkedHashMap<A, B> { private static final long serialVersionUID = -5958373482978477783L; private int cacheSize; public LRUCache(int cacheSize) { super(16, 0.75f, true); this.cacheSize = cacheSize; } protected boolean removeEldestEntry(Map.Entry<A, B> eldest) { return size() >= cacheSize; } } } 
+3
source share

To prove the point, here is the jmh test, which searches for a record on cards containing records 10, 10_000 and 10_000_000 . The results show that the search for constant is O (1). The problem is elsewhere in your code. Even if you do not understand the code, the results are just readable numbers (see the end).

 import java.math.BigDecimal; import java.math.RoundingMode; import java.util.HashMap; import java.util.Map; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.TimeUnit; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Level; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.TearDown; import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import javafx.geometry.Point2D; @BenchmarkMode(org.openjdk.jmh.annotations.Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS) public class TwoMapsTest { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(EightMillionsTest.class.getSimpleName()) .jvmArgs("-ea", "-Xms10g", "-Xmx10g") .shouldFailOnError(true) .build(); new Runner(opt).run(); } // the check bellow assert map.size() == numberOfElements; can absolutely fail // fingers crossed that it does not. @State(Scope.Thread) public static class Map_10 { int numberOfElements = 10; public Map<Point2D, Integer> map = new HashMap<>(); public Point2D mightBePresent = null; public Point2D isPresent = null; @Setup(Level.Iteration) public void setUp() { int randomInsideHowManyBoundry = ThreadLocalRandom.current().nextInt(numberOfElements); for (int i = 0; i < numberOfElements; ++i) { double[] d = generateTwoPoints(-3.0, 3.9999, 4); Point2D p = new Point2D(d[0], d[1]); if (isPresent == null && i == randomInsideHowManyBoundry) { isPresent = new Point2D(d[0], d[1]); } map.put(p, i); } assert map.containsKey(isPresent); assert map.size() == numberOfElements; } @TearDown(Level.Iteration) public void tearDown() { map.clear(); } } @State(Scope.Thread) public static class Map_10_000 { int numberOfElements = 10_000; public Map<Point2D, Integer> map = new HashMap<>(); public Point2D mightBePresent = null; public Point2D isPresent = null; @Setup(Level.Iteration) public void setUp() { int randomInsideHowManyBoundry = ThreadLocalRandom.current().nextInt(numberOfElements); for (int i = 0; i < numberOfElements; ++i) { double[] d = generateTwoPoints(-3.0, 3.9999, 4); Point2D p = new Point2D(d[0], d[1]); if (isPresent == null && i == randomInsideHowManyBoundry) { isPresent = new Point2D(d[0], d[1]); } map.put(p, i); } assert map.containsKey(isPresent); assert map.size() == numberOfElements; } @TearDown(Level.Iteration) public void tearDown() { map.clear(); } } @State(Scope.Thread) public static class Map_10_000_000 { int numberOfElements = 10_000_000; public Map<Point2D, Integer> map = new HashMap<>(); public Point2D mightBePresent = null; public Point2D isPresent = null; @Setup(Level.Iteration) public void setUp() { int randomInsideHowManyBoundry = ThreadLocalRandom.current().nextInt(10_000_000); for (int i = 0; i < 10_000_000; ++i) { double[] d = generateTwoPoints(-3.0, 3.9999, 4); Point2D p = new Point2D(d[0], d[1]); if (isPresent == null && i == randomInsideHowManyBoundry) { isPresent = new Point2D(d[0], d[1]); } map.put(p, i); } assert map.containsKey(isPresent); assert map.size() == numberOfElements; } @TearDown(Level.Iteration) public void tearDown() { map.clear(); } } @Fork(1) @Benchmark public int mightBePresentMap_10(Map_10 map) { Integer x = map.map.get(map.mightBePresent); return x == null ? -1 : x; } @Fork(1) @Benchmark public int isPresentMap_10(Map_10 map) { Integer x = map.map.get(map.isPresent); return x == null ? -1 : x; } @Fork(1) @Benchmark public int mightBePresentMap_10_000(Map_10_000 map) { Integer x = map.map.get(map.mightBePresent); return x == null ? -1 : x; } @Fork(1) @Benchmark public int isPresentMap_10_000(Map_10_000 map) { Integer x = map.map.get(map.isPresent); return x == null ? -1 : x; } @Fork(1) @Benchmark public int mightBePresentMap_10_000_000(Map_10_000_000 map) { Integer x = map.map.get(map.mightBePresent); return x == null ? -1 : x; } @Fork(1) @Benchmark public int isPresentMap_10_000_000(Map_10_000_000 map) { Integer x = map.map.get(map.isPresent); return x == null ? -1 : x; } private static double[] generateTwoPoints(double upperBound, double lowerBound, int precision) { double x = ThreadLocalRandom.current().nextDouble(lowerBound, upperBound); x = BigDecimal.valueOf(x).setScale(precision, RoundingMode.HALF_UP).doubleValue(); double y = ThreadLocalRandom.current().nextDouble(lowerBound, upperBound); y = BigDecimal.valueOf(x).setScale(precision, RoundingMode.HALF_UP).doubleValue(); return new double[] { x, y }; } } 

And the actual results:

  Benchmark (howManyEntries) Mode Cnt Score Error Units hashmap8Millions.EightMillionsTest.isPresent 1 avgt 5 8.787 ± 0.259 ns/op hashmap8Millions.EightMillionsTest.isPresent 10 avgt 5 9.988 ± 0.283 ns/op hashmap8Millions.EightMillionsTest.isPresent 256 avgt 5 9.146 ± 2.081 ns/op hashmap8Millions.EightMillionsTest.isPresent 10000 avgt 5 8.871 ± 0.574 ns/op hashmap8Millions.EightMillionsTest.isPresent 1000000 avgt 5 8.894 ± 0.676 ns/op hashmap8Millions.EightMillionsTest.isPresent 10000000 avgt 5 10.884 ± 5.623 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 1 avgt 5 4.607 ± 0.175 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 10 avgt 5 4.713 ± 0.944 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 256 avgt 5 5.283 ± 0.511 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 10000 avgt 5 8.944 ± 0.144 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 1000000 avgt 5 10.256 ± 0.121 ns/op hashmap8Millions.EightMillionsTest.mightBePresent 10000000 avgt 5 8.764 ± 0.504 ns/op 

Note that this is nano-seconds , far from it ...

+2
source share

All Articles