Hash table basics?

I am pretty confused about the basic concepts of a hash table. If I were to encode a hash, how could I start? What is the difference between a hash table and a regular array?

Basically, if someone answered this question, I think that all my questions will be answered: If I had 100 randomly generated numbers (like keys), how would I implement a hash table and why it was would be beneficial over an array?

Psuedo-code or Java will be appreciated as a learning tool ...

+52
java hashtable
Nov 12 '08 at 1:23
source share
11 answers

The answers so far have helped identify hash tables and explain some theories, but I think an example can help you better understand them.

What is the difference between a hash table and a regular array?

A hash table and an array are structures that allow you to store and retrieve data. Both allow you to specify the index and get the value associated with it. The difference, as Daniel Spivak noted, is that array indexes are sequential and hash table tables are based on the value of the data associated with them.

Why should I use a hash table?

A hash table can provide a very efficient way to search for items in large amounts of data, especially data that otherwise might not be easily searchable. (“Big” here means ginormous in the sense that it will take a long time to search sequentially).

If I were to code a hash, how could I start?

No problems. The easiest way is to invent an arbitrary mathematical operation that you can perform on data that returns the number N (usually an integer). Then use this number as an index in the bucket array and save your data in bucket # N The trick is to choose an operation that tends to place values ​​in different buckets so that it is easier to find them later.

Example: A large shopping center stores a database of cars and parking lots for its visitors to help customers remember where they are parked. The database stores make , color , license plate and parking location . After leaving the store, the buyer finds his car by entering its brand and color. The database returns a (relatively short) list of license plates and parking spaces. A quick scan allows you to find a buyer.

You can implement this with an SQL query:

 SELECT license, location FROM cars WHERE make="$(make)" AND color="$(color)" 

If the data was stored in an array, which is essentially just a list, you can imagine implementing a query by scanning the array for all the relevant records.

On the other hand, imagine a hash rule:

Add the ASCII character codes of all letters in make and color, divide them by 100, and use the remainder as the hash value.

This rule converts each item into a number from 0 to 99, essentially sorting the data into 100 buckets. Each time a client needs to find a car, you can make a hash and color to find one bucket of 100 that contains information. You immediately reduced your search by 100 times!

Now expand the example to huge amounts of data, for example, a database with millions of records, which are searched according to dozens of criteria. A “good” hash function will distribute the data in buckets in such a way as to minimize any additional searches, saving a significant amount of time.

+66
Nov 12 '08 at 2:57
source share

You must first understand what a hash function is. A hash function is a function that takes a key (for example, a string of arbitrary length) and returns a number as unique as possible. The same key must always return the same hash. A very simple string hashing function in java might look like

 public int stringHash(String s) { int h = s.length(); for(char c : s.toCharArray()) { h ^= c; } return h; } 

You can learn a good hash function at http://www.azillionmonkeys.com/qed/hash.html

The hash map now uses this hash value to put the value in an array. Simplified java method:

 public void put(String key, Object val) { int hash = stringHash(s) % array.length; if(array[hash] == null) { array[hash] = new LinkedList<Entry<String, Object> >(); } for(Entry e : array[hash]) { if(e.key.equals(key)){ e.value = val; return; } } array[hash].add(new Entry<String, Object>(key, val)); } 

(This card provides unique keys. Not all cards do.)

You can use two different hash keys for the same value, or two different hashes to map to the same array index. There are many methods to deal with this. The simplest is to use a linked list (or binary tree) for each index of the array. If the hash function is good enough, you will never need a linear search.

Now to find the key:

 public Object get(String key) { int hash = stringHash(key) % array.length; if(array[hash] != null) { for(Entry e : array[hash]) { if(e.key.equals(key)) return e.value; } } return null; } 
+45
Nov 12 '08 at 2:36
source share

Hashtables are associative. This is a huge difference from arrays, which are only linear data structures. With an array, you can do something like this:

 int[] arr = ... for (int i = 0; i < arr.length; i++) { System.out.println(arr[i] + 1); } 

Notice how you get the element from the array by specifying the exact memory offset ( i ). This contrasts with hash tables that allow you to store key / value pairs and then retrieve the value based on the key:

 Hashtable<String, Integer> table = new Hashtable<String, Integer>(); table.put("Daniel", 20); table.put("Chris", 18); table.put("Joseph", 16); 

In the above table, we can make the following call:

 int n = table.get("Chris"); 

... and rest assured that n will be evaluated at 18 .

I think this will probably answer most of your questions. Implementing a hash table is a pretty interesting topic, one with which Wikipedia treats pretty well .

+16
Nov 12 '08 at 1:30
source share

"I'm more interested in how Hash Tables look at the key and how the key is generated."

  • Hashing converts a key object to a number. This is called "hashing" - it makes a hash of the object. See hash function . For example, summing bytes of a string is a standard hash technique. You calculate the modulo 2 32 sum to save the hash in a manageable size. A hash always gives the same answer. This is O (1).

  • The number gives you a "slot" in a HashTable. For an arbitrary key object, the hash value computes the hash value. The hash value then gives you a slot in the table. Usually mod( hash, table size ) . This is O (1).

This is a common solution. Two numerical calculations, and you have chosen an arbitrary object as the key to an arbitrary object as a value. Few things can be as fast.

Converting from an object to a hash value occurs in one of these general ways.

  • If this is a "primitive" object of 4 bytes, then the original value of the object is a number.

  • The address of the object is 4 bytes, then the address of the object can be used as a hash value.

  • A simple hash function (MD5, SHA1, independently) accumulates the bytes of the object to create a 4-byte number. Advanced hashes are not simple sums of bytes, a simple sum does not reflect enough of all the original input bits.

The slot in the hash table is the mode (number, table size).

If this slot has the desired value, everything is ready. If this is not the desired value, you need to look elsewhere. There are several popular sensing algorithms for finding free space in a table. Linear is a simple search for the next free space. Quadratic is a non-linear jump around in search of a free slot. A random number generator (with a fixed seed) can be used to generate a series of probes that will distribute data evenly but arbitrarily.

Sounding algorithms are not O (1). If the table is large enough, the probability of collision is low, and the probes do not matter. If the table is too small, then a collision occurs and breakdown occurs. At this point, we are talking about “tuning and tuning” to balance sensing and table size to optimize performance. Usually we just make the table bigger.

See the hash table .

+8
Nov 12 '08 at 2:35
source share

Something that I have not seen, I especially noted:

The point of using a hash table over an array is performance.

Iteration through an array, as a rule, occurs from O (1) to O (x), where x is the number of elements in the array. However, the search time for your element will be extremely variable, especially if we are talking about hundreds of thousands of elements in an array.

A properly weighted hash table usually has an almost constant access time only over O (1), regardless of how many elements are in the hash table.

+6
Nov 12 '08 at 2:47
source share

You do not want to use a hash table for 100 randomly generated numbers.

A good way to think of hash tables is to think of value pairs. Let the students be used and say that everyone has a student identification number. In your program, you store information about students (names, phone numbers, accounts, etc.). You want to find all the student information using only basic information (for example, student name or ID).

Say you have 10,000 students. If you store them all in an array, you need to go through the entire array, comparing each student ID with the one you are looking for.

If instead you are a "hash" (see below) of their student identification number in a position in the array, you only need to look for a student whose numbers have the same hash. Much less effort to find what you want.

In this example, let’s say that the student identifiers are only 6 digits. Our hash function can only use the bottom 3 digits of the number as a "hash key". Thus, 232145 is hashed to the location of array 145. Thus, you only need an array of 999 elements (each element is a list of students).

This should be a good start for you. You should, of course, read the textbook or Wikipedia for this kind of information. But I suppose you have already done this and are tired of reading.

+4
Nov 12 '08 at 1:30
source share

Here, in a nutshell, how a hash table works.

Imagine you have a library full of books. If you stored books in an array, you would put each book in its place on a shelf, and then when someone asked you to find a book, you would go through all the shelves — rather slowly. If someone said "book # 12345", you could find it pretty easily.

Say, instead, you say that if the title of the book begins with "A", it goes on line 1. If the second letter is "B", it goes on line 1, rack 2. If the third letter is "C", it goes to line 1, rack 2, shelf 3 ... and so on, until you determine the position of the book. Then, based on the title of the book, you can know exactly where it should be.

Now there are some problems in the simplified hashing algorithm, which I described: some shelves will be overloaded, while others will remain empty, some books will be assigned to one slot. Thus, real hash functions are carefully designed to try to avoid such problems.

But this is the main idea.

+3
Nov 12 '08 at 2:38
source share

I will answer this part about the difference between a hash table and an array ... but since I have never implemented a hash algorithm for any import before, I will leave this to someone more knowledgeable :)

An array is just an ordered list of objects. The object itself does not matter ... the important thing is that if you want to list the objects in insertion order, it is always the same (which means that the first element always has index 0).

As for the hash table, which is indexed by keys, not the order ... I think that a basic search for hashing algorithms will give you much more information than I can ... Wikipedia has a very decent ... defines a "bucket" that keys are included in a quick search on arbitrary objects used as keys.

As for the benefits: if the insertion order is important, an array or some sort of ordered list is needed. If a quick search on an arbitrary key (using various hash functions) is important, then the hash table makes sense.

0
Nov 12 '08 at 1:31
source share

[This is a response to the comment made by me.yahoo.com/a above]

It depends on your hash function. Suppose your hash function hashes a word according to the length of your word, the key for chris will be 5. Similarly, the key for yahoo will also be 5. Now both values ​​(chris and yahoo) will be less than 5 (i.e. in the "bucket" with the key 5). This way you do not need to make the array equal to the size of your data.

0
Nov 12 '08 at 1:47
source share

In my opinion, they answered my question quite clearly and in different ways.

I would like to add another perspective (which may confuse the new reader as well)

At the level of least abstraction, arrays are just an adjacent block of memory. Given the starting address ( startAddress ), size ( sizeOfElement ) and index one element, the address of the element is calculated as:

 elementAddress = startAddress + sizeOfElement * index 

It is interesting to note that arrays can be abstracted / considered as hash tables using index as a key, and the above function is a hash function that calculates the location of a value in O (1)

0
May 24 '10 at 12:46
source share

A hash table is a data structure designed for quick retrieval.

Hash tables are invalid when the number of records is very small.

link

Some examples:

  import java.util.Collection; import java.util.Enumeration; import java.util.Hashtable; import java.util.Set; public class HashtableDemo { public static void main(String args[]) { // Creating Hashtable for example Hashtable companies = new Hashtable(); // Java Hashtable example to put object into Hashtable // put(key, value) is used to insert object into map companies.put("Google", "United States"); companies.put("Nokia", "Finland"); companies.put("Sony", "Japan"); // Java Hashtable example to get Object from Hashtable // get(key) method is used to retrieve Objects from Hashtable companies.get("Google"); // Hashtable containsKey Example // Use containsKey(Object) method to check if an Object exits as key in // hashtable System.out.println("Does hashtable contains Google as key: "+companies.containsKey("Google")); // Hashtable containsValue Example // just like containsKey(), containsValue returns true if hashtable // contains specified object as value System.out.println("Does hashtable contains Japan as value: "+companies.containsValue("Japan")); // Hashtable enumeration Example // hashtabl.elements() return enumeration of all hashtable values Enumeration enumeration = companies.elements(); while (enumeration.hasMoreElements()) { System.out.println("hashtable values: "+enumeration.nextElement()); } // How to check if Hashtable is empty in Java // use isEmpty method of hashtable to check emptiness of hashtable in // Java System.out.println("Is companies hashtable empty: "+companies.isEmpty()); // How to find size of Hashtable in Java // use hashtable.size() method to find size of hashtable in Java System.out.println("Size of hashtable in Java: " + companies.size()); // How to get all values form hashtable in Java // you can use keySet() method to get a Set of all the keys of hashtable // in Java Set hashtableKeys = companies.keySet(); // you can also get enumeration of all keys by using method keys() Enumeration hashtableKeysEnum = companies.keys(); // How to get all keys from hashtable in Java // There are two ways to get all values form hashtalbe first by using // Enumeration and second getting values ad Collection Enumeration hashtableValuesEnum = companies.elements(); Collection hashtableValues = companies.values(); // Hashtable clear example // by using clear() we can reuse an existing hashtable, it clears all // mappings. companies.clear(); } } 

Output:

 Does hashtable contains Google as key: true Does hashtable contains Japan as value: true hashtable values: Finland hashtable values: United States hashtable values: Japan Is companies hashtable empty: false Size of hashtable in Java: 3 
0
Aug 24 '13 at 10:27
source share



All Articles