What is hashtables?

  • What are they and how do they work?
  • Where are they used?
  • When should I (not) use them?

I have heard this word again and again, but I do not know its exact meaning.

I heard that they resolve associative arrays by sending the array key through a hash function that converts it to int and then uses a regular array. Am I right with that?

(Please note: this is not my homework, I also studied at school, but they teach us only the basics of computer science)

+7
hashtable
source share
3 answers

Wikipedia seems to be a pretty good answer to what they are.

You should use them when you want to look at values ​​at some index.

As for when you shouldn't use them ... when you don't want to look up values ​​by some index (for example, if all you want to do is iterate over them.)

+6
source share

You found out about it. This is a very good way of matching arbitrary things (keys) with arbitrary things (values). The idea is that you use a function (a hash function) that translates the key into the index into the array in which you store the values; The speed of the hash function is usually linear in key size, which is great when the size of the keys is much smaller than the number of entries (i.e. a typical case).

The hard bit is that hash functions are usually imperfect. (Ideal hash functions exist, but, as a rule, are very specific for specific applications and specific data sets, they are hardly ever worth it.) There are two approaches to solving this issue, and each of them requires storing a key with a value: one (open addressing) is to use a predefined pattern to look ahead from the location in the array with a hash for free space, and the other (chain) to store a linked list depending on each entry in the array (so you do a linear search sweat mu, hopefully short list). Cases of production code in which I read the source code, everyone used a chain with a dynamic rebuild of the hash table when the load factor is excessive.

+3
source share

Good hash functions are one-way functions that allow you to create a distributed value from any given input. This way you will get several unique values ​​for each input value. They are also repeatable, so any input will always generate the same output.

An example of a good hash function is SHA1 or SHA256.

Say you have a user database table. The columns are id , last_name , first_name , telephone_number and address .

Although any of these columns may have duplicates, suppose no row is the same.

In this case, id is simply the unique primary key of our creation (surrogate key). The id field does not actually contain any user data, because we could not find a natural key that was unique to users, but we use the id field to build foreign key relationships with other tables.

We could find such an entry in our database:

 SELECT * FROM users WHERE last_name = 'Adams' AND first_name = 'Marcus' AND address = '1234 Main St' AND telephone_number = '555-1212'; 

We need to search 4 different columns using 4 different indexes to find my record.

However, you can create a new hash column and store the hash value of all four columns together.

 String myHash = myHashFunction("Marcus" + "Adams" + "1234 Main St" + "555-1212"); 

You can get a hash value like AE32ABC31234CAD984EA8 .

You save this hash value as a column in the database and an index on it. Now you need to search only one index.

 SELECT * FROM users WHERE hash_value = 'AE32ABC31234CAD984EA8'; 

As soon as we have the identifier of the requested user, we can use this value to search for related data in other tables.

The idea is that the hash function is unloaded from the database server.

Collisions are unlikely. If two users have the same hash, most likely they have duplicate data.

+1
source share

All Articles