If you want to have a hash table with a million records, usually you should have at least 2 million buckets. I don’t remember all the statistics (the key term is “paradoxical birthday”), but the vast majority of buckets will have zero or one element. In principle, you can be very unlucky and get all the items in one bucket, but you will have even more bad luck than those people who seem struck by lightning in a day.
For hash tables that grow, the normal trick should grow by a constant percentage - the usual case of a textbook grows, doubling the size of the hash table. You do this every time the number of items in the hash table reaches a certain fraction of the size of the hash table, regardless of how many buckets are actually used. This gives the amortized expected O (1) performance for insert, delete, and search.
The linked list in each bucket of the hash table is just a way to handle collisions - unlikely in the sense for each operation, but during the life of a significant hash table they occur, especially in the form of a hash table, gets more than half.
Linked lists are not the only way to handle collisions - there is a huge amount of knowledge about this topic. Walter Bright (a developer of the D programming language) advocated using binary trees rather than linked lists, claiming that his Dscript significantly increased performance compared to the Javascript from this design.
He used simple (unbalanced) binary trees when I asked, so that the worst performance was the same as for linked lists, but the key point that I assume is that the binary tree processing code is simple and the hash table itself makes the chances of building large unbalanced trees very small.
Basically, you can just as easily use treaps, red-black trees or AVL trees. An interesting option would be to use splay trees to handle collisions. But overall, this is a small problem for several library designers and some real obsessive things to worry about.
source share