Iterating through combining multiple Java Map key sets efficiently

In one of my Java 6 projects, I have an array of LinkedHashMap instances as an input to a method that needs to iterate through all the keys (i.e., by combining the key sets of all the cards) and work with the associated values. Not all keys exist in all cards, and the method should not go through each key more than once or change input cards.

My current implementation is as follows:

Set<Object> keyset = new HashSet<Object>(); for (Map<Object, Object> map : input) { for (Object key : map.keySet()) { if (keyset.add(key)) { ... } } } 

The HashSet instance ensures that one key does not work more than once.

Unfortunately, this part of the code is quite critical for performance, as it is called very often. In fact, according to the profiler, more than 10% of the processor time is spent on the HashSet.add() method.

I am trying to optimize this code as much as possible. Using LinkedHashMap with its more efficient iterators (compared to regular HashMap ) was a significant incentive, but I was hoping to minimize what is essentially accounting.

Migrating all keys to a HashSet using addAll() turned out to be less efficient due to the cost of calling HashSet.contains() . Right now I'm looking to see if I can use a bitmap (well, boolean[] , to be precise) to completely exclude a HashSet, but this may not be possible at all, depending on my range of keys.

Is there a more efficient way to do this? Preferably something that won't limit the keys?

EDIT:

A few clarifications and comments:

  • I need all the values โ€‹โ€‹from the cards - I canโ€™t drop them.

  • I also need to know which map each value came to. The invalid part ( ... ) in my code would look something like this:

     for (Map<Object, Object> m : input) { Object v = m.get(key); // Do something with v } 

    A simple example to understand what I need to do with cards is to print all the cards in parallel:

     Key Map0 Map1 Map2 F 1 null 2 B 2 3 null C null null 5 ... 

    This is not what I actually do, but you should get this idea.

  • Input cards are extremely diverse. In fact, each call to this method uses a different set of them. Therefore, I would not gain anything by caching their keys.

  • My keys are all instances of String. They are sorted inside the heap using a separate HashMap, since they are quite duplicate, so their hash code is already cached and has most hash checks (when the HashMap implementation checks if the two keys are really equal, after their hash codes match) come down to matching identity ( == ). The profiler confirms that only 0.5% of the processorโ€™s time is spent on String.equals() and String.hashCode() .

EDIT 2:

Based on the suggestions in the answers, I conducted several tests, profiling and benchmarking along the way. I ended up with about a 7% increase in productivity. What I've done:

  • I set the initial HashSet capacity to double the total size of all input cards. This caused me something in the 1-2% range, eliminating most of the calls (total?) To resize() in the HashSet.

  • I used Map.entrySet() for the map that I am iterating now. Initially, I avoided this approach because of additional code and the fear that additional checks and calls to the Map.Entry getter method outweigh any benefits. It turned out that the general code was a bit faster.

  • I'm sure some people will start yelling at me, but here it is: Raw types. More specifically, I used a raw HashSet form in the above code. Since I already used Object as my content type, I do not lose any security. The cost of this useless checkcast operation when calling HashSet.add() was apparently important enough to provide a 4% increase in performance on deletion. Why does the JVM insist on checking drops on Object above me ...

+7
source share
4 answers

It is not possible to provide a replacement for your approach, but a few suggestions (a bit) to optimize existing code.

  • Consider initializing a hash set with a capacity (the sum of the sizes of all cards). This avoids / reduces the size change of the set during the add operation.
  • Do not use keySet() , as it will always create a new set in the background. Use entrySet() , which should be much faster
  • Look at the implementations of equals() and hashCode() - if they are "expensive", then you will negatively affect the add method.
+2
source

How you avoid using a HashSet depends on what you do.

I would count the union only once when changing input . This should be relatively rare, comparable to the number of searches.

 // on an update. Map<Key, Value> union = new LinkedHashMap<Key, Value>(); for (Map<Key, Value> map : input) union.putAll(map); // on a lookup. Value value = union.get(key); // process each key once for(Entry<Key, Value> entry: union) { // do something. } 
+1
source

Option A is to use the .values โ€‹โ€‹() method and iterate over it. But I suppose you already thought about that.

If the code is called so often, it might be worth creating additional structures (depending on how often the data changes). Create a new HashMap; each key in any of your hashmaps is the key in this, and HashMaps are saved in the list where that key is displayed.

This will help if the data is somewhat static (related to the frequency of requests), so the overload from managing the structure is relatively small, and if the key space is not very dense (keys do not repeat much in different HashMaps), as this will save a lot of unnecessary contains ().

Of course, if you mix data structures, it is better if you encapsulate everything in your own data structure.

0
source

All Articles