In one of my Java 6 projects, I have an array of LinkedHashMap instances as an input to a method that needs to iterate through all the keys (i.e., by combining the key sets of all the cards) and work with the associated values. Not all keys exist in all cards, and the method should not go through each key more than once or change input cards.
My current implementation is as follows:
Set<Object> keyset = new HashSet<Object>(); for (Map<Object, Object> map : input) { for (Object key : map.keySet()) { if (keyset.add(key)) { ... } } }
The HashSet instance ensures that one key does not work more than once.
Unfortunately, this part of the code is quite critical for performance, as it is called very often. In fact, according to the profiler, more than 10% of the processor time is spent on the HashSet.add() method.
I am trying to optimize this code as much as possible. Using LinkedHashMap with its more efficient iterators (compared to regular HashMap ) was a significant incentive, but I was hoping to minimize what is essentially accounting.
Migrating all keys to a HashSet using addAll() turned out to be less efficient due to the cost of calling HashSet.contains() . Right now I'm looking to see if I can use a bitmap (well, boolean[] , to be precise) to completely exclude a HashSet, but this may not be possible at all, depending on my range of keys.
Is there a more efficient way to do this? Preferably something that won't limit the keys?
EDIT:
A few clarifications and comments:
I need all the values โโfrom the cards - I canโt drop them.
I also need to know which map each value came to. The invalid part ( ... ) in my code would look something like this:
for (Map<Object, Object> m : input) { Object v = m.get(key);
A simple example to understand what I need to do with cards is to print all the cards in parallel:
Key Map0 Map1 Map2 F 1 null 2 B 2 3 null C null null 5 ...
This is not what I actually do, but you should get this idea.
Input cards are extremely diverse. In fact, each call to this method uses a different set of them. Therefore, I would not gain anything by caching their keys.
My keys are all instances of String. They are sorted inside the heap using a separate HashMap, since they are quite duplicate, so their hash code is already cached and has most hash checks (when the HashMap implementation checks if the two keys are really equal, after their hash codes match) come down to matching identity ( == ). The profiler confirms that only 0.5% of the processorโs time is spent on String.equals() and String.hashCode() .
EDIT 2:
Based on the suggestions in the answers, I conducted several tests, profiling and benchmarking along the way. I ended up with about a 7% increase in productivity. What I've done:
I set the initial HashSet capacity to double the total size of all input cards. This caused me something in the 1-2% range, eliminating most of the calls (total?) To resize() in the HashSet.
I used Map.entrySet() for the map that I am iterating now. Initially, I avoided this approach because of additional code and the fear that additional checks and calls to the Map.Entry getter method outweigh any benefits. It turned out that the general code was a bit faster.
I'm sure some people will start yelling at me, but here it is: Raw types. More specifically, I used a raw HashSet form in the above code. Since I already used Object as my content type, I do not lose any security. The cost of this useless checkcast operation when calling HashSet.add() was apparently important enough to provide a 4% increase in performance on deletion. Why does the JVM insist on checking drops on Object above me ...
thkala
source share