How to improve the performance of a hash table with 1 million items and 997 buckets?

Question

How to improve the performance of a hash table with 1 million items and 997 buckets?

This is an interview question.

Suppose the table has 1 million items and 997 buckets of unordered lists. Next, suppose a hash function distributes keys with equal probability (i.e., each bucket has 1000 elements).

What is the worst time to search for an item that is not in the table? Find the one in the table? How can you improve this?

My solution: The worst time an item is not in the table and in the table is all O (1000). 1000 is the length of the unsorted list.

Improve it: (0), increase the number of bucket> 1 million. (1) each bucket contains a second hash table, which uses a different hash function to calculate the hash value for the second table. it will be O (1) (2) each bucket contains a binary search tree. This will be O (log n).

a compromise can be made between space and time. Keep them both in a reasonable range.

Any better ideas? thank you

+5

c ++ hashtable hashmap data-structures hash

user1002288 Feb 06 '12 at 6:19

source share

4 answers

Jerry Coffin · Answer 1 · 2012-02-06T06:23:44+0000

The simplest and most obvious improvement would be to increase the number of buckets in the hash table to about 1.2 million - at least assuming your hash function can generate numbers in this range (which will usually be).

valdo · Answer 2 · 2012-02-06T09:06:55+0000

, . , ( - ), :

- , ( ). -, ( ), .

, - . . -, , -. , - , .

N - , M - , O [log (N/M)] .

Ben Jackson · Answer 3 · 2012-02-06T07:00:21+0000

, :

1000, 1M, , , , . .

, , , ( -).

Tony delroy · Answer 4 · 2012-02-06T08:56:29+0000

, 1 997 . , - (.. 1000 ).

, ....

, ? , ? ?

( = ) - , , (.. 1000), . " ", , , N, , # N : , 997 , , . N/997, , , O (N).

: - O (1000). 1000 - .

- , Big-O .

: (0) , a > 1 . (1) -, - - . O (1) (2), . O (lg n).
. .

- . , 1 , , . , (% ), , , . Rehashing - , , CPU, .

Hash table tables in hash tables are completely pointless and startlingly wasteful. It is much better to use part of this space to reduce collisions in an external hash table.

How to improve the performance of a hash table with 1 million items and 997 buckets?

More articles: