Runtime to insert n elements into an empty hash table

Question

Runtime to insert n elements into an empty hash table

People say that it depreciates O (1) to put in the hash table. Therefore, put n elements must be O (n). However, this is not true for large n, because, as the respondent said: “All you need to satisfy the expected amortized O (1) is to expand the table and rephrase everything with a new random hash function at any time in the collision.”

So: what is the average execution time for inserting n elements into a hash table? I understand that this probably depends on the implementation, so please indicate what type of implementation you are talking about.

For example, if there is (log n) equidistributed conflicts, and each collision takes O (k) to resolve, where k is the current size of the hash table, then you should have this repetition ratio:

T(n) = T(n/2) + n/2 + n/2

(i.e., you spend time inserting n / 2 elements, then you have a collision accepting n / 2 to solve, then you do the remaining n / 2 inserts without a collision). It still ends with O (n), so yay. But is that reasonable?

+4

hashtable hashmap algorithm hash runtime

Claudiu May 05, '09 at 19:19

source share

4 answers

People say that it depreciates O (1) to put in the hash table.

From a theoretical point of view, depreciation of O (1) is expected.

The hash tables are basically a randomized data structure, in the same sense that quicksort is a randomized algorithm. You need to generate your hash functions with some randomness, or there are pathological inputs that are not O (1).

You can achieve the expected amortized O (1) using dynamic perfect hashing :

The naive idea that I set out was to rephrase a new random hash function in each collision. (See also perfect hash functions ) The problem is that this requires O (n ^ 2) space, from the birth paradox.

The solution is to have two hash tables, with a second table for collisions; resolve conflicts on this second table by restoring it. There will be O (\ sqrt {n}) elements in this table, so they will grow to O (n) size.

In practice, you often use a fixed hash function because you can assume (or don't care) that your input is pathological, just as you often sort quickly without first accessing the input.

+5

Captain segfault May 05, '09 at 21:16

source share

All O (1) say that the operation is performed in constant time, and does not depend on the number of elements in your data structure.

In simple terms, this means you have to pay the same cost no matter how big your data structure is.

In practice, this means that simple data structures, such as trees, are usually more efficient when you do not need to store a lot of data. In my experience, I find trees faster to ~ 1k elements (32-bit integers), and then hash tables get the upper hand. But, as usual, YMMW.

+1

Nova May 05 '09 at 10:31

source share

Why not just run some tests on your system? Maybe if you publish the source, we can go back and test them on our systems, and we could create this very useful discussion.

This is simply not an implementation, but an environment that decides how much time the algorithm takes. However, you can see if any comparison samples are available or not. The problem with me in publishing my results will be useless, because people have no idea what else works on my system, how much RAM is now free, and so on. You can only have a broad idea. And it's about as good as what gives you big O.

0

dirkgently May 05, '09 at 19:25

source share

Paul sonier · Accepted Answer · 2009-05-05T19:24:52+0000

It completely depends on how inefficient your reboot is. In particular, if you can correctly estimate the expected size of your hash table a second time, your runtime is still approaching O (n). In fact, you must indicate how ineffective your calculation of the size of your Reich is before you can determine the expected order.

Runtime to insert n elements into an empty hash table

More articles: