You can make a hybrid solution. Make a hash table where each slot is a binary tree. Say 1024 slots, each with a binary tree. Collisions on the hash go to the binary tree, no need to block everything on the insert, just the tree that needs to be updated.
In addition, if you use atomic operations and some kind of trick, you can make it even more parallel, avoiding full blocking. Atomic update to insert a new node, if atomic update shuts down, another thread adds node, just loop again until you succeed.
I already did this https://github.com/exodist/GSD/tree/master/Structures
This is a highly competitive hash / tree hybrid. It uses atomic operations, not mutexes. It can rotate on the insert, but it never blocks reading or updating existing keys. It can be changed / balanced / etc. You can move all entries, etc. Manages its own memory and has callbacks for ref-counting references / values. You can also โbindโ keys in one dictionary to another, so updating the key in the first changes the value of the second.
This project really increases productivity when you throw more problems when asking a problem:
(T) - how many threads are ignored by MI, slots - how many hash slots are used, and Ops - the number of elements that are inserted (divided by threads, to see how much each thread will be executed)
./performance.run T, MI, Slots, Ops: Insert Rebalance Update Resize Lookup Immute 4, 16, 1024, 5000000: 2.765500363 0.915232065 2.540659476 2.172654813 2.545409560 2.089993698 13.029449975 4, 16, 32768, 5000000: 2.122459866 1.403548309 2.413483374 1.885083901 2.092272466 2.643681518 12.560529434 4, 16, 1048576, 5000000: 1.700994809 1.063704010 2.030809367 2.457275707 1.453413924 3.671012425 12.377210242 16, 16, 1024, 5000000: 0.785675781 2.311281904 1.805610753 0.621521146 0.549546473 0.744009412 6.817645469 16, 16, 32768, 5000000: 0.497359638 0.316017553 1.257663142 0.610761414 0.390849355 0.825944608 3.898595710 16, 16, 1048576, 5000000: 0.328308989 0.647632801 1.267230625 1.139402693 0.342399827 1.189220470 4.914195405 64, 16, 1024, 5000000: 0.129407132 0.767262021 2.631929019 0.157977313 0.103848004 0.177964574 3.968388063 64, 16, 32768, 5000000: 0.087656606 0.068330231 1.365794852 0.166261966 0.079112728 0.203542885 1.970699268 64, 16, 1048576, 5000000: 0.074605680 0.284322979 1.372998607 0.650503349 0.084956938 0.828653807 3.296041360
Notes: Single launch on i7 with 8 GB of RAM. Uses the atomic gcc built-in kernels in the transition from the old __sync to __atomic, which may help performance due to memory models.