Hashmap Slower After Deserialization - Why?

I have a pretty big Hashmap (~ 250 MB). Creation takes about 50-55 seconds, so I decided to serialize it and save it to a file. Reading from a file takes about 16-17 seconds.

The only problem is that the search looks slower. I always thought that the hash of the file is read from the file into memory, so the performance should be the same as when I create the hash map myself, right? Here is the code that I use to read hashmap into a file:

File file = new File("omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); omahaMap = (HashMap<Long, Integer>) s.readObject(); s.close(); 

300 million search queries take about 3.1 seconds when I create a hash map myself, and about 8.5 seconds when I read the same hash file from a file. Does anyone have an idea why? I do not notice anything obvious?

EDIT:

I โ€œmeasuredโ€ time by simply taking time with System.nanotime (), so the proper comparison method was not used. Here is the code:

 public class HandEvaluationTest { public static void Test() { HandEvaluation.populate5Card(); HandEvaluation.populate9CardOmaha(); Card[] player1cards = {new Card("4s"), new Card("2s"), new Card("8h"), new Card("4d")}; Card[] player2cards = {new Card("As"), new Card("9s"), new Card("6c"), new Card("2h")}; Card[] player3cards = {new Card("9h"), new Card("7h"), new Card("Kc"), new Card("Kh")}; Card[] table = {new Card("2d"), new Card("2c"), new Card("3c"), new Card("5c"), new Card("4h")}; int j=0, k=0, l=0; long startTime = System.nanoTime(); for(int p=0; p<100000000; p++) { j = HandEvaluation.handEval9Hash(player1cards, table); k = HandEvaluation.handEval9Hash(player2cards, table); l = HandEvaluation.handEval9Hash(player3cards, table); } long estimatedTime = System.nanoTime() - startTime; System.out.println("Time needed: " + estimatedTime*Math.pow(10,-6) + "ms"); System.out.println("Handstrength Player 1: " + j); System.out.println("Handstrength Player 2: " + k); System.out.println("Handstrength Player 3: " + l); } } 

A lot of hashmap work is done in HandEvaluation.populate9CardOmaha (). 5-card small. Code for the big one:

  public static void populate9CardOmaha() { //Check if the hashmap is already there- then just read it and exit File hashmap = new File("omaha.ser"); if(hashmap.exists()) { try { File file = new File("omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); omahaMap = (HashMap<Long, Integer>) s.readObject(); s.close(); } catch(IOException ioex) {ioex.printStackTrace();} catch(ClassNotFoundException cnfex) { System.out.println("Class not found"); cnfex.printStackTrace(); return; } return; } // if it not there, populate it yourself ... Code for populating hashmap ... // and then save it to file ( try { File file = new File("omaha.ser"); FileOutputStream f = new FileOutputStream(file); ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f)); s.writeObject(omahaMap); s.close(); } catch(IOException ioex) {ioex.printStackTrace();} } 

When I fill it out (= the file is not here), the search in HandEvaluationTest.Test () takes about 8 seconds instead of 3. Maybe this is just my most naive way of measuring elapsed time?

+5
source share
2 answers

This question was interesting, so I wrote my own test case to test it. I did not find a speed difference for a live Vs search that was loaded from a serialized file. The program is available at the end of the message for anyone interested in launching it.

  • Methods were tracked using JProfiler.
  • A serialized file is comparable to yours. ~ 230 MB .
  • Search by memory cost 1210 ms without any serialization

enter image description here

  • After the serialization of the map and its repeated viewing, the search cost remained the same (almost almost 1224 ms)

enter image description here

  • The profiler has been configured to add minimal overhead in both scenarios.
  • This was measured on the Java(TM) SE Runtime Environment (build 1.6.0_25-b06) / 4 CPUs running at 1.7 Ghz / 4GB Ram 800 Mhz

The measurement is complicated. I myself noticed the 8 second search time that you described, but guess what else I noticed when this happened.

GC Activities

enter image description here

Your measurements will probably pick this up too. If you isolate only Map.get() dimensions, you will see that the results are comparable.


 public class GenericTest { public static void main(String... args) { // Call the methods as you please for a live Vs ser <-> de_ser run } private static Map<Long, Integer> generateHashMap() { Map<Long, Integer> map = new HashMap<Long, Integer>(); final Random random = new Random(); for(int counter = 0 ; counter < 10000000 ; counter++) { final int value = random.nextInt(); final long key = random.nextLong(); map.put(key, value); } return map; } private static void lookupItems(int n, Map<Long, Integer> map) { final Random random = new Random(); for(int counter = 0 ; counter < n ; counter++) { final long key = random.nextLong(); final Integer value = map.get(key); } } private static void serialize(Map<Long, Integer> map) { try { File file = new File("temp/omaha.ser"); FileOutputStream f = new FileOutputStream(file); ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f)); s.writeObject(map); s.close(); } catch (Exception e) { e.printStackTrace(); } } private static Map<Long, Integer> deserialize() { try { File file = new File("temp/omaha.ser"); FileInputStream f = new FileInputStream(file); ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f)); HashMap<Long, Integer> map = (HashMap<Long, Integer>) s.readObject(); s.close(); return map; } catch (Exception e) { throw new RuntimeException(e); } } } 
+3
source

300 million search queries take about 3.1 seconds when I create a hash map myself, and about 8.5 seconds when I read the same hash file from a file. Does anyone have an idea why? I do not notice anything obvious?

One of the possible reasons is that the restored HashMap may not have the same capacity (number of buckets) as the original one, which can increase the frequency of hash collisions or (if the size is increased) reduces the locality of access to the main memory (as a result of which there is no longer enough misses). To check, use the debugger to check the length of map.table before and after reconstruction. If this is true, try copying the data to the new HashMap with the corresponding loadFactor.

As for why serialization does not support capacity: HashMap sets up its serialization format (it makes no sense to serialize zero for each empty table element) by providing writeObject and readObject methods and ignoring the capacity that it finds in the input stream:

 /** * Reconstitute the {@code HashMap} instance from a stream (ie, * deserialize it). */ private void readObject(java.io.ObjectInputStream s) throws IOException, ClassNotFoundException { // Read in the threshold (ignored), loadfactor, and any hidden stuff s.defaultReadObject(); reinitialize(); if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new InvalidObjectException("Illegal load factor: " + loadFactor); s.readInt(); // Read and ignore number of buckets int mappings = s.readInt(); // Read number of mappings (size) if (mappings < 0) throw new InvalidObjectException("Illegal mappings count: " + mappings); else if (mappings > 0) { // (if zero, use defaults) // Size the table using given load factor only if within // range of 0.25...4.0 float lf = Math.min(Math.max(0.25f, loadFactor), 4.0f); float fc = (float)mappings / lf + 1.0f; int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ? DEFAULT_INITIAL_CAPACITY : (fc >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : tableSizeFor((int)fc)); float ft = (float)cap * lf; threshold = ((cap < MAXIMUM_CAPACITY && ft < MAXIMUM_CAPACITY) ? (int)ft : Integer.MAX_VALUE); @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] tab = (Node<K,V>[])new Node[cap]; table = tab; // Read the keys and values, and put the mappings in the HashMap for (int i = 0; i < mappings; i++) { @SuppressWarnings("unchecked") K key = (K) s.readObject(); @SuppressWarnings("unchecked") V value = (V) s.readObject(); putVal(hash(key), key, value, false, false); } } } 

I suspect that he ignores the number of buckets in order to prevent a denial of service attack when the attacker creates a serialization stream and gives an unrealistically high (or low) number of buckets, which will cause OutOfMemoryError (or excessive CPU load for hash collisions), which would be a cheap way to attack a denial of service of any application that accepts serialized data from unconfigured sources ( CVE-2012-2739 describes this problem).

+2
source

Source: https://habr.com/ru/post/1213415/


All Articles