I have a huge (โ 10 m) list of records. Each entry offers two hash functions:
- Cheap: hash quickly computes, but its distribution is terrible (can put 99% of items in 1% of the hash space)
- Expensive: it takes a long time to calculate, but the distribution is much better.
A regular dictionary allows me to use only one of these hash functions. I would like to use a dictionary that first uses a cheap hash function and checks for an expensive collision situation.
It seems like a good idea to use a dictionary inside dictionaries for this. I currently mainly use this monster:
Dictionary<int, Dictionary<int, List<Foo>>>;
I improved this design, so an expensive hash is only called if there are actually two elements of the same cheap hash.
It fits perfectly and does an impeccable job for me, but it looks like it should have passed away 65 million years ago.
As far as I know, this functionality is not included in the basic structure. I'm going to write a DoubleHashedDictionary class, but first want to know your opinion.
As for my specific case:
First hash function = number of files in the file system directory (fast) Second hash function = sum of file sizes (slow)
Editing:
- The name has been changed and more information has been added.
- Pretty important missing details added.
source share