Choosing a data structure for very large data

Question

Choosing a data structure for very large data

I have x (millions) of positive integers, where their values can be just as valid (+2,147,483,647). Assuming they are unique, what is the best way to store them for an intensive search program.

So far, I have been thinking about using a binary AVL tree or hash table, where an integer is the key to the displayed data (name). However, I'm not sure if I can implement such large keys in such a large number with a hash table (would not create a load factor> 0.8 in addition to the tendency to collide?)

May I get some recommendations as to which data structure might be appropriate for my situation.

+5

performance hashtable data-structures lookup avl-tree

Carlos Nov 24 '10 at 1:34

source share

5 answers

B-? log_m(n) log_(m/2)(n), , m 8-10 , 10.

+2

Actorclavilis 24 . '10 1:55

-, , . , . Bentley Programming Pearls .

+2

gsb 05 . '13 16:00

, , , . - O (1), , , , , .

, int, - .

+1

Michael Peddicord 24 . '10 1:55

-. , (, ).

32- , , set, map, hash_set ++. 4- , 100%. , , . , .

, , . O (log n) O (1), "" , - . C bsearch(), , .

edit: , " ()". ? ? , . , , 10 , " "; , , .

0

Javier 24 . '10 2:37

Jeffrey Hantin · Accepted Answer · 2010-11-24T01:55:46+0000

The choice of structure depends largely on how much memory you have. I guess based on the description you need to look for, but do not iterate over them, find nearby or other similar operations.

Best of all is probably a hash pivot table. By placing hash collisions in buckets and storing separate arrays in a bucket for keys and values, you can both reduce the size of the table and use the processor cache acceleration when searching in the bucket. A linear search in a bucket may even end faster than a binary search!

AVL , , . , , . , B- - , B-tree, .

Choosing a data structure for very large data

More articles: