My hash table is slower than binary search

I implemented binary search, linear search and hash table to compare each time complexity. The problem is that somehow my hash table is much slower than binary search, when I measure the time to find prime numbers. Below is my code:

// Make the hash table 20 times the number of prime numbers HashTable::HashTable(std::vector<int> primes) { int tablesize = primes.size() * 20; table = new std::list<int>[tablesize]; size = tablesize; for (auto &prime : primes) this->insert(prime); } // Hash function int HashTable::hash(int key) { return key % size; } // Finds element int HashTable::find(int key) { // Get index from hash int index = hash(key); // Find element std::list<int>::iterator foundelement = std::find(table[index].begin(), table[index].end(), key); // If element has been found return index // If not, return -1 if (foundelement != table[index].end()) return index; else return -1; } // Adds element to hashtable void HashTable::insert(int element) { // Get index from hash and insert the element int index = hash(element); table[index].push_back(element); } 

HashTable.h

 #ifndef HASHTABLE_H #define HASHTABLE_H #include <list> #include <iostream> #include <vector> class HashTable { private: // Each position in Hashtable has an array of lists to store elements in case of collision std::list<int>* table; // Size of hashtable int size; // Hashfunction that returns the array location for the given key int hash(int key); public: HashTable(int tablesize); HashTable(std::vector<int> primes); // Adds element to hashtable void insert(int element); // Deletes an element by key void remove(int key); // Returns an element from hashtable for a given key int find(int key); // Displays the hashtable void printTable(); // Display histogram to illustrate elements distribution void printHistogram(); // Returns the number of lists in hash table int getSize(); // Returns the total number of elements in hash table int getNumberOfItems(); // De-allocates all memory used for the Hash Table. ~HashTable(); }; #endif 

I already tried to exceed the table size to exclude collisions, but I did not notice any difference.

This is the result

+6
source share
3 answers

Some things that are suboptimal with hash table implementation:

  • primes.size() * 20 is excessive - you will get a lot more cache misses than necessary; try a range of values ​​from 1 to ~ 2 to find the optimum point

  • primes.size() * 20 always even, and all the prime numbers you use with key % size are odd, so you never make a hash on half the buckets, lose space and degrade cache performance

  • you handle conflicts with linked lists: this means that you always follow at least one pointer away from the contiguous memory of the table, which is slow, and for collisions that you jump into memory with each node in the list; using std::vector<int> to store counter values ​​would limit the jump to 1 memory area outside the hash table, or you could use private hash / open address and offset lists to usually find the item in the nearest hash table cache: mine breakpoints found to be about an order of magnitude higher for similar int values.

+4
source

If your data is completely random, it can be difficult to find a good constant for the modulo operation. If your data matches some pattern, you can try running a put of candidate constants to see which one is best for your data.

In this publication, I showed how you can conduct such a large-scale test. As a result, my hash table performed an average search of 1.5 comparisons with the worst case 14. The table contained 16,000 entries, approximately 2 ^ 14.

+1
source

It's all about binary complexity searching - O (log n), and your find is linear, so O (n), at some point it was worse when you had a lot of collisions.

0
source

All Articles