How does a hash table work? This is faster than "SELECT * from .."

Question

How does a hash table work? This is faster than "SELECT * from .."

Let's say I have:

  Key |  Indexes |  Key-values
 ---- + --------- + ------------
 001 |  100001 |  Alex
 002 |  100002 |  Micheal
 003 |  100003 |  Daniel

Say we want to find 001, how do you perform a quick search using a hash table?

Isn't that the way we use "SELECT * from .." in mysql? I read a lot, they say, "SELECT *" search from beginning to end, but there is no hash table? Why and how?

Using a hash table, do we reduce the records we are looking for? How?

Can anyone demonstrate how to insert and retrieve a hash table in mysql query code? eg,

SELECT * from table1 where hash_value="bla" ...

Another scenario: If the indices are similar to S0001, S0002, T0001, T0002, etc. In mysql, I can use:

 SELECT * from table WHERE value = S*

not the same and faster?

+6

hashtable mysql hash

roa3 Feb 12 '09 at 6:48

source share

5 answers

Don't worry about what MySQL does internally to quickly find records. The task of the database is to do this for you. Just run the query SELECT [columns] FROM table WHERE [condition]; and let the database generate a query plan for you. Note that you do not want to use SELECT * , because if you ever add a column to a table that splits all your old queries that relied on a certain number of columns in a certain order.

If you really want to know what is happening under the hood (it's good to know, but not to do it yourself: this is the goal of the database!), You need to know what indexes and how they work. If the table does not have an index in the columns participating in the WHERE clause, then, as you say, the database will have to search each row of the table to find those that match your condition. But if there is an index, the database will look for the index to find the exact location of the required rows and go directly to them. Indexes are usually implemented as B + -trees , a type of search tree that uses very few comparisons to search for a specific item. Finding a B-tree for a specific key is very fast. MySQL can also use hash indexes, but they are generally slower to use in a database. Hash indexes usually work well only on long keys (especially character strings), since they reduce the key size to a fixed hash size. For data types such as integers and real numbers that have well-defined ordering and fixed lengths, the ease of finding the B-tree usually provides better performance.

You can see the chapters in the MySQL User Guide and PostgreSQL Indexing Guide .

+3

kquinn Feb 12 '09 at 7:42

source share

http://en.wikipedia.org/wiki/Hash_table

Hash tables can be used as in-memory data structures. Hash tables can also be used for use with persistent data structures; database indexes sometimes use hash-based disk-based data structures , although balanced trees are more popular.

+1

Jeff atwood Feb 12 '09 at 7:45

source share

Hash tables are great for posting entries at O (1) cost, where the key (which is used for hashing) is already known. They are widely used both in collection libraries and in database engines. You should be able to find a lot of information about them on the Internet. Why don't you start with Wikipedia or just do a Google search?

I do not know the details of mysql. If there is a structure called a hash table, this is likely to be a kind of table that uses hashing to find keys. I'm sure someone else will tell you about this. =)

EDIT: (in response to the comment)

Ok I will try to make a very simplified explanation: a hash table is a table in which entries are located based on the key function. For example, say you want to store recruitment information. If you store it in a regular unsorted array, you will need to iterate over the elements sequentially to find the record you are looking for. On average, this will require N / 2 comparisons.

If instead you put all records in indexes based on the first character of the face name. (A = 0, B = 1, C = 2, etc.), you can immediately find the correct entry as long as you know your name. This is the main idea. You probably understand that in order to support several records having the same first letter, special processing is required (renaming or permission of record lists). If you have a hash table with a good location, you should be able to directly access the item you are looking for. This means an approximate comparison with the rejection of the special processing that I just mentioned.

0

Stein norheim Feb 12 '09 at 7:01

source share

I assume that you can use the hash function to get the identifier you want to select. how

SELECT * FROM table WHERE value = hash_fn (whatever_input_you_build_your_hash_value_from)

Then you do not need to know the identifier of the row you want to select, and you can make an exact request. Since you know that a string will always have the same identifier due to input, you create a hash value form and you can always recreate this identifier through a hash function.

However, this is not always true depending on the size of the table and the maximum number of hash values (you often have an “X-hash size table” somewhere in your hash). To take care of this, you must have a deterministic strategy that you use every time you get two values with the same identifier. You should check Wikipedia for more information about this strategy called collision handling and should be mentioned in the same article as hash tables.

MySQL probably uses hash tables somewhere because of the mentioned function O (1) norheim.se (up).

0

jowi Feb 12 '09 at 7:37

source share

Daniel Earwicker · Accepted Answer · 2009-02-12T08:59:49+0000

A simple hash table works by storing items in multiple lists, not just one. It uses a very fast and repeatable (i.e. nonrandom) method to select which list each item contains. Therefore, when it is time to find the element again, it repeats this method to find out which list to look at, and then performs a regular (slow) linear search in this list.

By dividing the items into 17 lists, the search will be 17 times faster, which is a good improvement.

Although, of course, this is true only if the lists are approximately the same length, so it is important to choose a good method for distributing elements between lists.

In your example table, the first column is the key, we need to find the element. And let's assume that we will maintain 17 lists. To insert something, we perform an operation on a key called hash. It just turns the key into a number. It does not return a random number, because it should always return the same number for the same key. But at the same time, the numbers should be "widely distributed."

Then we take the resulting number and usage module to reduce it to the size of our list:

 Hash(key) % 17

All this happens very quickly. Our lists are in an array, therefore:

 _lists[Hash(key % 17)].Add(record);

And then, to find the item using this key:

 Record found = _lists[Hash(key % 17)].Find(key);

Please note that each list can only be any type of container or a linked list class that you write manually. When we run Find on this list, it works in a slow way (examine the key of each record).

How does a hash table work? This is faster than "SELECT * from .."

More articles: