Edit # 2:
Well, I messed up my first rule - never optimize prematurely. The worst case for this is probably using a wide range of HashMap stock - so I just did it. It still works like a second, so forget about everything else and just do it.
And I will make ANOTHER comment on myself to ALWAYS check the speed before worrying about complex implementations.
(Below is an old obsolete post that can still be valid if someone has MANY more points than a million)
A HashSet will work, but if your integers have a reasonable range (say 1-1000), it would be more efficient to create an array of 1000 integers and for each of your millions of integers to increase this element array. (Pretty much the same idea as HashMap, but optimizing a few unknowns that the Hash should do should make it several times faster).
You can also create a tree. Each node in the tree will contain (value, count), and the tree will be organized by value (lower values on the left, higher on the right). Go to node, if it does not exist - insert it - if so, then just increase the score.
The range and distribution of your values will determine which of these two (or a regular hash) will work best. I think that a regular hash will not have many “winning” cases (it should be a wide range and “grouped” data, and even then the tree can win.
Since this is pretty trivial - I recommend that you implement more than one solution and test speed against the actual data set.
Edit: RE comment
TreeMap will work, but it will add a layer of indirection anyway (and this is so surprisingly easy and interesting to implement yourself). If you are using a stock implementation, you need to use integers and constantly convert to and from int for each increase. There is a pointer to a pointer to Integer and the fact that you store at least 2x as many objects. This does not even take into account the overhead of method calls, since they must be bound with any luck.
Usually this would be optimization (evil), but when you start to approach hundreds of thousands of nodes, you sometimes have to ensure efficiency, so the built-in TreeMap will be ineffective for the same reasons as the built-in HashSet.
Bill k
source share