Persistent hash for strings?

Question

Persistent hash for strings?

Another SO question led some language tools to hash strings to give them a quick table lookup. Two examples of this are the <> dictionary in .NET and the storage structure {} in Python. Other languages certainly support such a mechanism. C ++ has its own map; LISP has an equivalent, like most other modern languages.

In response to the question that hash algorithms on strings can be carried out continuously with one SO member, who has 25 years of programming experience, it could be argued that everything can be hashed in a constant time. My personal statement is that this is not true if your specific application does not put a border on the length of the string. This means that some constant K will determine the maximum length of the string.

I am familiar with the Rabin-Karp algorithm, which uses a hash function for its operation, but this algorithm does not dictate the use of a specific hash function, and the one that the authors proposed is O (m), where m is the length of the hashed string.

I see some other pages, such as this one ( http://www.cse.yorku.ca/~oz/hash.html ), that display some hashing algorithms, but it seems that each of them iterates over the entire length of the line to get its value.

From my comparatively limited reading on this subject, it seems that most associative arrays for string types are actually created using a hash function that works with some kind of tree under the hood. It can be an AVL tree or a red / black tree that indicates the location of a value element in a key / value pair.

Even with this tree structure, if we are to stay at theta (log (n)) level, where n is the number of elements in the tree, we need to have a constant time hash algorithm. Otherwise, we have an additive penalty for iterating over the string. Despite the fact that theta (m) is overshadowed by theta (log (n)) for indexes containing many rows, we cannot ignore it if we are in such an area that the texts that we conduct against will be very large.

, / Aho-Corasick (m) , , , - , SO.

.

+5

string associative-array

San Jacinto 07 . '09 18:31

7

- ( ) .

10 , , 100 . .

1. , -, .

+7

Mark Byers 07 . '09 18:39

-.

, . , 6 . - URL-. has "http:/" .

. , - , - "" , .

+3

Josef Grahn 07 . '09 18:50

, , , , , - , , ", ".

- , - .

, , , , - .

+1

Pascal Cuoq 07 . '09 18:39

- , .

- , -, , - . . - .

, .

+1

xtofl 07 . '09 18:50

, , , "", - . Interning - , . ( ) , .

+1

Nick Johnson 08 . '09 16:02

, .

, , {1,2,..., b}. - h H-.

, , H-, -.

- h: y , A = {s: h (s) = y} , . - h ' A. y' , A '= {s A: h' (s) = y '} , .. , -. . H. , -. CQFD.

Further reading : Reasonable hashing of variable-length strings is not possible http://lemire.me/blog/archives/2009/10/02/sensible-hashing-of-variable-length-strings-is-impossible/

+1

Daniel Lemire Dec 10 '10 at 14:45

source share

Ron Warholic · Accepted Answer · 2009-12-07T18:39:57+0000

, , O (n) n . , , O (1).

-, Min (n, 20) . , O (1) . ? ...

Persistent hash for strings?

More articles: