What is the optimal capacity and load factor for a fixed-size HashMap?

Question

What is the optimal capacity and load factor for a fixed-size HashMap?

I am trying to figure out the optimal capacity and load factor for a particular case. I think I understood the essence of this, but I would still be grateful for confirmation from someone better known than me. :)

If I know that my HashMap will fill up to contain, say, 100 objects, and spend most of the time on 100 objects, I assume that the optimal values are an initial capacity of 100 and a load factor of 1? Or do I need a capacity of 101, or are there any other gotchas?

EDIT: Well, I put off a few hours and did some testing. Here are the results:

It is curious that performance, capacity + 1, capacity + 2, capacity-1, and even power -10 all give exactly the same results. I would expect at least capacity-1 and power-10 to give worse results.
Using initial capacity (as opposed to using the default value of 16) gives a noticeable improvement to put () - up to 30% faster.
Using a load factor of 1 gives equal performance for a small number of objects and better performance for a larger number of objects (> 100000). However, this does not improve in proportion to the number of objects; I suspect there is an additional factor affecting the results. Performance
get () is slightly different for a different number of objects / capacity, but although it may vary slightly from case to case, it usually does not depend on the initial capacity or load factor.

EDIT2: adding some graphs on my part. Here is an example of the difference between a load factor of 0.75 and 1 when I initialize the HashMap and fill it to full capacity. On the y scale, time is in ms (lower is better), and x is size (number of objects). As the size varies linearly, the required time also grows linearly.

So, let's see what I did. The following two diagrams show the difference in load factors. The first graph shows what happens when the HashMap fills to full capacity; a load factor of 0.75 works worse due to resizing. However, this is not always worse, and there are all kinds of kicks and jumps - I think the GC has a great game in this. A load factor of 1.25 does the same as 1, so it is not included in the diagram.

fully filled

This chart proves that 0.75 is worse due to resizing; if we fill the HashMap to half the capacity, 0.75 is no worse, just ... different (and it should use less memory and have an imperceptibly better iteration performance).

half full

Another thing I want to show. This is the performance for all three load factors and different sizes of HashMap. Constantly constant with a slight change, with the exception of one peak for load factor 1. I really want to know what it is (maybe GC, but who knows).

go spike

And here is the code for those interested:

import java.util.HashMap; import java.util.Map; public class HashMapTest { // capacity - numbers high as 10000000 require -mx1536m -ms1536m JVM parameters public static final int CAPACITY = 10000000; public static final int ITERATIONS = 10000; // set to false to print put performance, or to true to print get performance boolean doIterations = false; private Map<Integer, String> cache; public void fillCache(int capacity) { long t = System.currentTimeMillis(); for (int i = 0; i <= capacity; i++) cache.put(i, "Value number " + i); if (!doIterations) { System.out.print(System.currentTimeMillis() - t); System.out.print("\t"); } } public void iterate(int capacity) { long t = System.currentTimeMillis(); for (int i = 0; i <= ITERATIONS; i++) { long x = Math.round(Math.random() * capacity); String result = cache.get((int) x); } if (doIterations) { System.out.print(System.currentTimeMillis() - t); System.out.print("\t"); } } public void test(float loadFactor, int divider) { for (int i = 10000; i <= CAPACITY; i+= 10000) { cache = new HashMap<Integer, String>(i, loadFactor); fillCache(i / divider); if (doIterations) iterate(i / divider); } System.out.println(); } public static void main(String[] args) { HashMapTest test = new HashMapTest(); // fill to capacity test.test(0.75f, 1); test.test(1, 1); test.test(1.25f, 1); // fill to half capacity test.test(0.75f, 2); test.test(1, 2); test.test(1.25f, 2); } }

+72

java hashmap

Domchi Aug 18 '11 at 23:40

source share

4 answers

, , . You said:

, , + 1, + 2, -1 -10 . , , -1 -10, .

. , , , 513, 600, 700, 800, 900, 1000 1024 (1024). , @G_H, , . .

JDK:

 /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); }

+9

durron597 26 . '13 20:39

source share

101 . , , , - , .

... 1 .

: .

-, , HashMap 100 ; , , . , , , . , , . , , ; those. .

-, 101 , ... , 100 , 100 , Javadoc, , , 100 . , , . ... 101 , , java.util.HashMap . Hoorah.

-, , HashMap , 1 " , " , .

... n , n n , , , ... ... . , . , , n n/0.75 .

...

:

 static Random r = new Random(); public static void main(String[] args){ int[] tests = {100, 1000, 10000}; int runs = 5000; float lf_sta = 1f; float lf_dyn = 0.75f; for(int t:tests){ System.err.println("=======Test Put "+t+""); HashMap<Integer,Integer> map = new HashMap<Integer,Integer>(); long norm_put = testInserts(map, t, runs); System.err.print("Norm put:"+norm_put+" ms. "); int cap_sta = t; map = new HashMap<Integer,Integer>(cap_sta, lf_sta); long sta_put = testInserts(map, t, runs); System.err.print("Static put:"+sta_put+" ms. "); int cap_dyn = (int)Math.ceil((float)t/lf_dyn); map = new HashMap<Integer,Integer>(cap_dyn, lf_dyn); long dyn_put = testInserts(map, t, runs); System.err.println("Dynamic put:"+dyn_put+" ms. "); } for(int t:tests){ System.err.println("=======Test Get (hits) "+t+""); HashMap<Integer,Integer> map = new HashMap<Integer,Integer>(); fill(map, t); long norm_get_hits = testGetHits(map, t, runs); System.err.print("Norm get (hits):"+norm_get_hits+" ms. "); int cap_sta = t; map = new HashMap<Integer,Integer>(cap_sta, lf_sta); fill(map, t); long sta_get_hits = testGetHits(map, t, runs); System.err.print("Static get (hits):"+sta_get_hits+" ms. "); int cap_dyn = (int)Math.ceil((float)t/lf_dyn); map = new HashMap<Integer,Integer>(cap_dyn, lf_dyn); fill(map, t); long dyn_get_hits = testGetHits(map, t, runs); System.err.println("Dynamic get (hits):"+dyn_get_hits+" ms. "); } for(int t:tests){ System.err.println("=======Test Get (Rand) "+t+""); HashMap<Integer,Integer> map = new HashMap<Integer,Integer>(); fill(map, t); long norm_get_rand = testGetRand(map, t, runs); System.err.print("Norm get (rand):"+norm_get_rand+" ms. "); int cap_sta = t; map = new HashMap<Integer,Integer>(cap_sta, lf_sta); fill(map, t); long sta_get_rand = testGetRand(map, t, runs); System.err.print("Static get (rand):"+sta_get_rand+" ms. "); int cap_dyn = (int)Math.ceil((float)t/lf_dyn); map = new HashMap<Integer,Integer>(cap_dyn, lf_dyn); fill(map, t); long dyn_get_rand = testGetRand(map, t, runs); System.err.println("Dynamic get (rand):"+dyn_get_rand+" ms. "); } } public static long testInserts(HashMap<Integer,Integer> map, int test, int runs){ long b4 = System.currentTimeMillis(); for(int i=0; i<runs; i++){ fill(map, test); map.clear(); } return System.currentTimeMillis()-b4; } public static void fill(HashMap<Integer,Integer> map, int test){ for(int j=0; j<test; j++){ if(map.put(r.nextInt(), j)!=null){ j--; } } } public static long testGetHits(HashMap<Integer,Integer> map, int test, int runs){ long b4 = System.currentTimeMillis(); ArrayList<Integer> keys = new ArrayList<Integer>(); keys.addAll(map.keySet()); for(int i=0; i<runs; i++){ for(int j=0; j<test; j++){ keys.get(r.nextInt(keys.size())); } } return System.currentTimeMillis()-b4; } public static long testGetRand(HashMap<Integer,Integer> map, int test, int runs){ long b4 = System.currentTimeMillis(); for(int i=0; i<runs; i++){ for(int j=0; j<test; j++){ map.get(r.nextInt()); } } return System.currentTimeMillis()-b4; }

:

 =======Test Put 100 Norm put:78 ms. Static put:78 ms. Dynamic put:62 ms. =======Test Put 1000 Norm put:764 ms. Static put:763 ms. Dynamic put:748 ms. =======Test Put 10000 Norm put:12921 ms. Static put:12889 ms. Dynamic put:12873 ms. =======Test Get (hits) 100 Norm get (hits):47 ms. Static get (hits):31 ms. Dynamic get (hits):32 ms. =======Test Get (hits) 1000 Norm get (hits):327 ms. Static get (hits):328 ms. Dynamic get (hits):343 ms. =======Test Get (hits) 10000 Norm get (hits):3304 ms. Static get (hits):3366 ms. Dynamic get (hits):3413 ms. =======Test Get (Rand) 100 Norm get (rand):63 ms. Static get (rand):46 ms. Dynamic get (rand):47 ms. =======Test Get (Rand) 1000 Norm get (rand):483 ms. Static get (rand):499 ms. Dynamic get (rand):483 ms. =======Test Get (Rand) 10000 Norm get (rand):5190 ms. Static get (rand):5362 ms. Dynamic get (rand):5236 ms.

re: & uarr; — → || & larr; .

( ), glib, .

+2

badroit 18 . '11 23:48

source share

HashMap JavaDoc:

, (.75) . , ( HashMap, get put). , . , , .

, 100 , 0,75 (100/0,75). 134.

, , . , HashMap "" , , ? -, . , , - , O (1) ?

EDIT: ... , - - . , . , , , . , ( +1) , 1, , . . Resizing is still relatively quick and can happen only once, while searches are performed for almost any relevant map operation. As a result, optimizations for quick searches are what you really want here. You can combine this without resizing, making a JavaDoc: take the required capacity, divide by the optimal load factor (for example, 0.75) and use it as an initial capacity with this load factor. Add 1 to make sure rounding is not giving you.

+1

G_H Aug 18 '11 at 23:58

source share

G_H · Accepted Answer · 2011-08-22 22:09

Well, to put this thing at rest, I created a test application to run several scripts and get a visualization of the results. Here's how the tests run:

Several different sizes of collection were tested: one hundred, one thousand and one hundred thousand records.
The keys used are instances of the class that are uniquely identified by an identifier. Each test uses unique keys with the addition of integers in the form of identifiers. The equals method uses only an identifier, so no key mapping overwrites another.
The keys receive a hash code, which consists of the remainder of the module from their identifier to some preset number. We will call this number a hash limit. This allowed me to control the number of hash collisions that were expected. For example, if our collection size is 100, we will have keys with identifiers from 0 to 99. If the hash limit is 100, each key will have a unique hash code. If the hash limit is 50, key 0 will have the same hash code as key 50, 1 will have the same hash code as 51. In other words, the expected number of hash collisions per key is the collection size divided by on the hash limit.
For each combination of collection size and hash limit, I run a test using hash cards initialized with different settings. These parameters are the load factor and initial power, which is expressed as the collection tuning factor. For example, a test with a collection size of 100 and an initial capacity factor of 1.25 initializes a hash map with an initial capacity of 125.
The value for each key is just a new Object .
Each test result is encapsulated in an instance of the Result class. At the end of all tests, the results are ranked from worst overall performance to best.
The average time for puts and gets is calculated at 10 puts / gets.
All test combinations are run once to eliminate the impact of JIT compilation. After that, tests are performed for actual results.

Here's the class:

 package hashmaptest; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; public class HashMapTest { private static final List<Result> results = new ArrayList<Result>(); public static void main(String[] args) throws IOException { //First entry of each array is the sample collection size, subsequent entries //are the hash limits final int[][] sampleSizesAndHashLimits = new int[][] { {100, 50, 90, 100}, {1000, 500, 900, 990, 1000}, {100000, 10000, 90000, 99000, 100000} }; final double[] initialCapacityFactors = new double[] {0.5, 0.75, 1.0, 1.25, 1.5, 2.0}; final float[] loadFactors = new float[] {0.5f, 0.75f, 1.0f, 1.25f}; //Doing a warmup run to eliminate JIT influence for(int[] sizeAndLimits : sampleSizesAndHashLimits) { int size = sizeAndLimits[0]; for(int i = 1; i < sizeAndLimits.length; ++i) { int limit = sizeAndLimits[i]; for(double initCapacityFactor : initialCapacityFactors) { for(float loadFactor : loadFactors) { runTest(limit, size, initCapacityFactor, loadFactor); } } } } results.clear(); //Now for the real thing... for(int[] sizeAndLimits : sampleSizesAndHashLimits) { int size = sizeAndLimits[0]; for(int i = 1; i < sizeAndLimits.length; ++i) { int limit = sizeAndLimits[i]; for(double initCapacityFactor : initialCapacityFactors) { for(float loadFactor : loadFactors) { runTest(limit, size, initCapacityFactor, loadFactor); } } } } Collections.sort(results); for(final Result result : results) { result.printSummary(); } // ResultVisualizer.visualizeResults(results); } private static void runTest(final int hashLimit, final int sampleSize, final double initCapacityFactor, final float loadFactor) { final int initialCapacity = (int)(sampleSize * initCapacityFactor); System.out.println("Running test for a sample collection of size " + sampleSize + ", an initial capacity of " + initialCapacity + ", a load factor of " + loadFactor + " and keys with a hash code limited to " + hashLimit); System.out.println("===================="); double hashOverload = (((double)sampleSize/hashLimit) - 1.0) * 100.0; System.out.println("Hash code overload: " + hashOverload + "%"); //Generating our sample key collection. final List<Key> keys = generateSamples(hashLimit, sampleSize); //Generating our value collection final List<Object> values = generateValues(sampleSize); final HashMap<Key, Object> map = new HashMap<Key, Object>(initialCapacity, loadFactor); final long startPut = System.nanoTime(); for(int i = 0; i < sampleSize; ++i) { map.put(keys.get(i), values.get(i)); } final long endPut = System.nanoTime(); final long putTime = endPut - startPut; final long averagePutTime = putTime/(sampleSize/10); System.out.println("Time to map all keys to their values: " + putTime + " ns"); System.out.println("Average put time per 10 entries: " + averagePutTime + " ns"); final long startGet = System.nanoTime(); for(int i = 0; i < sampleSize; ++i) { map.get(keys.get(i)); } final long endGet = System.nanoTime(); final long getTime = endGet - startGet; final long averageGetTime = getTime/(sampleSize/10); System.out.println("Time to get the value for every key: " + getTime + " ns"); System.out.println("Average get time per 10 entries: " + averageGetTime + " ns"); System.out.println(""); final Result result = new Result(sampleSize, initialCapacity, loadFactor, hashOverload, averagePutTime, averageGetTime, hashLimit); results.add(result); //Haha, what kind of noob explicitly calls for garbage collection? System.gc(); try { Thread.sleep(200); } catch(final InterruptedException e) {} } private static List<Key> generateSamples(final int hashLimit, final int sampleSize) { final ArrayList<Key> result = new ArrayList<Key>(sampleSize); for(int i = 0; i < sampleSize; ++i) { result.add(new Key(i, hashLimit)); } return result; } private static List<Object> generateValues(final int sampleSize) { final ArrayList<Object> result = new ArrayList<Object>(sampleSize); for(int i = 0; i < sampleSize; ++i) { result.add(new Object()); } return result; } private static class Key { private final int hashCode; private final int id; Key(final int id, final int hashLimit) { //Equals implies same hashCode if limit is the same //Same hashCode doesn't necessarily implies equals this.id = id; this.hashCode = id % hashLimit; } @Override public int hashCode() { return hashCode; } @Override public boolean equals(final Object o) { return ((Key)o).id == this.id; } } static class Result implements Comparable<Result> { final int sampleSize; final int initialCapacity; final float loadFactor; final double hashOverloadPercentage; final long averagePutTime; final long averageGetTime; final int hashLimit; Result(final int sampleSize, final int initialCapacity, final float loadFactor, final double hashOverloadPercentage, final long averagePutTime, final long averageGetTime, final int hashLimit) { this.sampleSize = sampleSize; this.initialCapacity = initialCapacity; this.loadFactor = loadFactor; this.hashOverloadPercentage = hashOverloadPercentage; this.averagePutTime = averagePutTime; this.averageGetTime = averageGetTime; this.hashLimit = hashLimit; } @Override public int compareTo(final Result o) { final long putDiff = o.averagePutTime - this.averagePutTime; final long getDiff = o.averageGetTime - this.averageGetTime; return (int)(putDiff + getDiff); } void printSummary() { System.out.println("" + averagePutTime + " ns per 10 puts, " + averageGetTime + " ns per 10 gets, for a load factor of " + loadFactor + ", initial capacity of " + initialCapacity + " for " + sampleSize + " mappings and " + hashOverloadPercentage + "% hash code overload."); } } }

Running this may take some time. Results are printed as standard. You may have noticed that I commented on the line. This line invokes a visualizer that outputs visual representations of the results to png files. The class for this is given below. If you want to run it, uncomment the corresponding line in the above code. Be careful: the visualizer class assumes that you are running Windows and creating folders and files in C: \ temp. When working on a different platform, adjust this.

 package hashmaptest; import hashmaptest.HashMapTest.Result; import java.awt.Color; import java.awt.Graphics2D; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import java.text.DecimalFormat; import java.text.NumberFormat; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import javax.imageio.ImageIO; public class ResultVisualizer { private static final Map<Integer, Map<Integer, Set<Result>>> sampleSizeToHashLimit = new HashMap<Integer, Map<Integer, Set<Result>>>(); private static final DecimalFormat df = new DecimalFormat("0.00"); static void visualizeResults(final List<Result> results) throws IOException { final File tempFolder = new File("C:\\temp"); final File baseFolder = makeFolder(tempFolder, "hashmap_tests"); long bestPutTime = -1L; long worstPutTime = 0L; long bestGetTime = -1L; long worstGetTime = 0L; for(final Result result : results) { final Integer sampleSize = result.sampleSize; final Integer hashLimit = result.hashLimit; final long putTime = result.averagePutTime; final long getTime = result.averageGetTime; if(bestPutTime == -1L || putTime < bestPutTime) bestPutTime = putTime; if(bestGetTime <= -1.0f || getTime < bestGetTime) bestGetTime = getTime; if(putTime > worstPutTime) worstPutTime = putTime; if(getTime > worstGetTime) worstGetTime = getTime; Map<Integer, Set<Result>> hashLimitToResults = sampleSizeToHashLimit.get(sampleSize); if(hashLimitToResults == null) { hashLimitToResults = new HashMap<Integer, Set<Result>>(); sampleSizeToHashLimit.put(sampleSize, hashLimitToResults); } Set<Result> resultSet = hashLimitToResults.get(hashLimit); if(resultSet == null) { resultSet = new HashSet<Result>(); hashLimitToResults.put(hashLimit, resultSet); } resultSet.add(result); } System.out.println("Best average put time: " + bestPutTime + " ns"); System.out.println("Best average get time: " + bestGetTime + " ns"); System.out.println("Worst average put time: " + worstPutTime + " ns"); System.out.println("Worst average get time: " + worstGetTime + " ns"); for(final Integer sampleSize : sampleSizeToHashLimit.keySet()) { final File sizeFolder = makeFolder(baseFolder, "sample_size_" + sampleSize); final Map<Integer, Set<Result>> hashLimitToResults = sampleSizeToHashLimit.get(sampleSize); for(final Integer hashLimit : hashLimitToResults.keySet()) { final File limitFolder = makeFolder(sizeFolder, "hash_limit_" + hashLimit); final Set<Result> resultSet = hashLimitToResults.get(hashLimit); final Set<Float> loadFactorSet = new HashSet<Float>(); final Set<Integer> initialCapacitySet = new HashSet<Integer>(); for(final Result result : resultSet) { loadFactorSet.add(result.loadFactor); initialCapacitySet.add(result.initialCapacity); } final List<Float> loadFactors = new ArrayList<Float>(loadFactorSet); final List<Integer> initialCapacities = new ArrayList<Integer>(initialCapacitySet); Collections.sort(loadFactors); Collections.sort(initialCapacities); final BufferedImage putImage = renderMap(resultSet, loadFactors, initialCapacities, worstPutTime, bestPutTime, false); final BufferedImage getImage = renderMap(resultSet, loadFactors, initialCapacities, worstGetTime, bestGetTime, true); final String putFileName = "size_" + sampleSize + "_hlimit_" + hashLimit + "_puts.png"; final String getFileName = "size_" + sampleSize + "_hlimit_" + hashLimit + "_gets.png"; writeImage(putImage, limitFolder, putFileName); writeImage(getImage, limitFolder, getFileName); } } } private static File makeFolder(final File parent, final String folder) throws IOException { final File child = new File(parent, folder); if(!child.exists()) child.mkdir(); return child; } private static BufferedImage renderMap(final Set<Result> results, final List<Float> loadFactors, final List<Integer> initialCapacities, final float worst, final float best, final boolean get) { //[x][y] => x is mapped to initial capacity, y is mapped to load factor final Color[][] map = new Color[initialCapacities.size()][loadFactors.size()]; for(final Result result : results) { final int x = initialCapacities.indexOf(result.initialCapacity); final int y = loadFactors.indexOf(result.loadFactor); final float time = get ? result.averageGetTime : result.averagePutTime; final float score = (time - best)/(worst - best); final Color c = new Color(score, 1.0f - score, 0.0f); map[x][y] = c; } final int imageWidth = initialCapacities.size() * 40 + 50; final int imageHeight = loadFactors.size() * 40 + 50; final BufferedImage image = new BufferedImage(imageWidth, imageHeight, BufferedImage.TYPE_3BYTE_BGR); final Graphics2D g = image.createGraphics(); g.setColor(Color.WHITE); g.fillRect(0, 0, imageWidth, imageHeight); for(int x = 0; x < map.length; ++x) { for(int y = 0; y < map[x].length; ++y) { g.setColor(map[x][y]); g.fillRect(50 + x*40, imageHeight - 50 - (y+1)*40, 40, 40); g.setColor(Color.BLACK); g.drawLine(25, imageHeight - 50 - (y+1)*40, 50, imageHeight - 50 - (y+1)*40); final Float loadFactor = loadFactors.get(y); g.drawString(df.format(loadFactor), 10, imageHeight - 65 - (y)*40); } g.setColor(Color.BLACK); g.drawLine(50 + (x+1)*40, imageHeight - 50, 50 + (x+1)*40, imageHeight - 15); final int initialCapacity = initialCapacities.get(x); g.drawString(((initialCapacity%1000 == 0) ? "" + (initialCapacity/1000) + "K" : "" + initialCapacity), 15 + (x+1)*40, imageHeight - 25); } g.drawLine(25, imageHeight - 50, imageWidth, imageHeight - 50); g.drawLine(50, 0, 50, imageHeight - 25); g.dispose(); return image; } private static void writeImage(final BufferedImage image, final File folder, final String filename) throws IOException { final File imageFile = new File(folder, filename); ImageIO.write(image, "png", imageFile); } }

The visualized output is as follows:

Tests are divided first by the size of the collection, then by the hash limit.
For each test, there is an output image relative to the average start time (for 10 stats) and the average receive time (10 gets). Images are two-dimensional "heat maps" that show color for a combination of initial capacity and load factor.
The colors in the images are based on average time on a normalized scale from best to worst, from saturated green to saturated red. In other words, the best time will be completely green, and the worst time will be completely red. Two different dimensions of time should never have the same color.
Color maps are calculated separately for puts and gets, but cover all tests for the corresponding categories.
Visualizations show the initial capacitance along the x axis and the load factor along the y axis.

Without further ado, take a look at the results. I will start with the results for puts.

Put the results

Collection size: 100. Hash limit: 50. This means that every hash code must appear twice, and every other key collides in the hash map.

size_100_hlimit_50_puts

Well, it does not start very well. We see that there is a large access point for the initial capacity 25% higher than the collection size with a load factor of 1. The lower left corner does not work too well.

Collection size: 100. Hash limit: 90. One out of ten keys has a repeating hash code.

size_100_hlimit_90_puts

This is a slightly more realistic scenario, lacking a perfect hash function, but still 10% overload. The hot spot is gone, but the combination of low initial capacity with a low load factor obviously does not work.

Collection size: 100. Hash limit: 100. Each key is its own unique hash code. No collisions are expected if there are enough buckets.

size_100_hlimit_100_puts

An initial capacity of 100 with a load factor of 1 seems fine. Surprisingly, a higher initial capacity with a lower load factor is not always good.

Collection size: 1000. Hash limit: 500. Here it is becoming more serious, with 1000 entries. As in the first test, there is a hash overload from 2 to 1.

size_1000_hlimit_500_puts

The bottom left is still bad. But there seems to be a symmetry between the combined lower initial count / high load ratio and the higher initial count / low load ratio.

Collection size: 1000. Hash limit: 900. This means that one out of ten hash codes will be repeated twice. Intelligent collision scenario.

size_1000_hlimit_900_puts

There, something very funny happens with an unlikely combination of initial power that is too small with a load factor above 1, which is rather inconsistent. Otherwise, it is still pretty symmetrical.

Collection size: 1000. Hash limit: 990. Some collisions, but only a few. In this respect, it is quite realistic.

size_1000_hlimit_990_puts

We have a pretty symmetry. The lower left corner is still not optimal, but the load factor of 1000 power units / 1.0 and the load factor of 1250 / 0.75 load are at the same level.

Collection size: 1000. Hash limit: 1000. There are no duplicate hash codes, but now with a sample size of 1000.

size_1000_hlimit_1000_puts

Nothing to say here. The combination of a higher initial capacity with a load factor of 0.75 seems to be slightly superior to the combination of 1000 initial powers with a load factor of 1.

Collection Size: 100_000. Hash Limit: 10_000. Well, now this is getting serious, with a sample size of one hundred thousand and 100 hash codes for each key.

size_100000_hlimit_10000_puts

Hop! I think we found our lower range. The initialization capacity of exactly the size of the collection with a load factor of 1 does very well here, but not like in the whole store.

Collection Size: 100_000. Hash limit: 90_000. A little more realistic than the previous test, here we have a 10% overload in hash codes.

size_100000_hlimit_90000_puts

The bottom left corner is still undesirable. Higher starting capacities work best.

Collection Size: 100_000. Hash Limit: 99_000. Good script, this. Large collection with 1% hash overload.

size_100000_hlimit_99000_puts

Using the exact size of the collection as an init capacity with a load factor of 1 wins here! However, slightly larger initialization capabilities work pretty well.

Collection Size: 100_000. Hash limit: 100_000. Big one. The largest collection with perfect hash function.

size_100000_hlimit_100000_puts

Some amazing things here. Initial capacity with 50% additional room with a load factor of 1 victory.

Good thing this is for puts. Now we will check everything. Remember that the cards below are for the best / worst pickup times; start times are no longer taken into account.

Get Results

Collection size: 100. Hash limit: 50. This means that every hash code should appear twice, and every other key should have collided on the hash map.

<T411>

Eh ... What?

Collection size: 100. Hash limit: 90. One out of ten keys has a repeating hash code.

size_100_hlimit_90_gets

Hey Nelly! This is the most likely scenario that correlates with the survey question, and apparently the initial capacity of 100 with a load factor of 1 is one of the worst things here! I swear I did not fake it.

Collection size: 100. Hash limit: 100. Each key is its own unique hash code. No collisions are expected.

size_100_hlimit_100_gets

It looks a little calmer. Basically the same results in all directions.

Collection size: 1000. Hash limit: 500. As in the first test, there is a hash overload from 2 to 1, but now with a lot more entries.

size_1000_hlimit_500_gets

It seems that any setting will bring a decent result.

Collection size: 1000. Hash limit: 900. This means that one out of ten hash codes will be repeated twice. Intelligent collision scenario.

size_1000_hlimit_900_gets

And just like using puts for this setting, we get an anomaly in a strange place.

Collection size: 1000. Hash limit: 990. Some collisions, but only a few. In this respect, it is quite realistic.

size_1000_hlimit_990_gets

Decent performance everywhere except for a combination of high initial capacity with a low load factor. I would expect this for puts, as one would expect resizing of two hash cards. But why to receive?

Collection size: 1000. Hash limit: 1000. There are no duplicate hash codes, but now with a sample size of 1000.

size_1000_hlimit_1000_gets

Absolutely unpredictable visualization. This seems to work no matter what.

Collection Size: 100_000. Hash Limit: 10_000. Going to 100K again, with a lot of hash code is blocked.

size_100000_hlimit_10000_gets

It does not look beautiful, although bad spots are very localized. The performance here, apparently, largely depends on a certain synergy between the settings.

Collection Size: 100_000. Hash limit: 90_000. A little more realistic than the previous test, here we have a 10% overload in hash codes.

size_100000_hlimit_90000_gets

Large dispersion, although if you squint, you will see an arrow pointing to the upper right corner.

Collection Size: 100_000. Hash Limit: 99_000. Good script, this. Large collection with 1% hash overload.

size_100000_hlimit_99000_gets

Very chaotic. It is hard to find a lot of structure here.

Collection Size: 100_000. Hash limit: 100_000. Big one. The largest collection with perfect hash function.

size_100000_hlimit_100000_gets

Does anyone think this is starting to look like Atari graphics? This, apparently, contributes to the initial capacity of the size of the collection, -25% or + 50%.

Well, it's time to draw conclusions ...

Regarding runtime: you want to avoid initial capacities that are lower than the expected number of entries in the map. If the exact number is known in advance, this number or something a little higher seems to work best. Higher load factors can compensate for lower initial powers due to an earlier hash map change. For higher initial capacities, they do not seem to be so important.
As for getting time: the results here are a bit chaotic. There is nothing to conclude. It seems like he relies heavily on the subtle relationships between hash overlap, initial capacity, and load factor, with some supposedly bad settings working well, and good settings working terribly.
I'm apparently full of crap when it comes to assumptions about Java performance. In truth, if you don't configure your settings for the HashMap implementation, the results will be everywhere. , , 16 , , , , - , .
. 10 1179 5105 . 10 547 3484 . 6 , . , , , .

, . , - , , . , , Java, , . , , , O (n ^ 3).

What is the optimal capacity and load factor for a fixed-size HashMap?

More articles: