Let us first revise one of the commonly used operations on data structures i.e., Searching.
Searching – Searching is finding the desired data in a data structure. As we need to check at many indices whether the data is present or not, this operation becomes a complex operation with high time complexity.
When we compare the time complexities of searching in other data structures such as Linked List, Array (Sorted and Unsorted), and Binary Search Tree, this time complexity seems to be quite high.
- Linked List – O(n)
- Array (unsorted) – O(n)
- Array (Sorted) – O(log₂n)
- Binary Search Tree – O(log₂n)
To minimize the time complexity of this search operation, we use the concept of Hashing.
What is Hashing?
In hashing, we create a key-value pair, which stores the key as the item and corresponding to it, a value is there which is the index of the location. So, to search for the value we just need to get to that value and we can get the item if present.
This seems to be a bit confusing. But we will understand this much better using the diagram given below –
In this, every item i.e., the key is matched with the (key+1) value, so to search any item for example 3, we don’t need to check all the elements. We just need to visit the key at 3+1 i.e., 4 and if it is present we can print “Yes”.
This hash table makes our work a lot easier by decreasing the time complexity to linear time i.e., O(1).
This can also be in the reverse order i.e., the key is used to store the index and the value will store the data associated with the key. So, you may find this on some other websites which are also correct.
The formula that we used in this example was key+1. Every key was linked to the +1 index. But, this is not necessary. We can derive any formula for matching the key-value pairs depending upon the user’s comfortability. This formula is known as the hash formula.
This formula can be any mathematical expression like modulo 10, square modulo 10, etc.
One of the major problems that we deal with while using this data structure is the problem of collision.
While matching a key-value pair, it is possible that after applying the formula to find the key that will be used to store the index of a particular item, two or more values are going to be matched at the same key.
For example, if we use the formula – modulo 10 for calculating the key for storing the index. And if we use this for the values 2,5,6,13,12,24,32. We will see that 2%10 is equal to 12%10 and also equal to 32%10. So this is a bit confusing which value we will match to the key with index 2?
So, for this, we have a variety of methods to resolve the problem of collision.
These methods are –
- Chaining
- Linear Probing
- Quadratic Probing
- Double Hashing
Chaining method in hashing
In this method, if more than two values reside at a given key, as we have seen in the example above, a chain is created at the index. Like the values 2, 12, and 32 are trying to reside at index 2, so the 2 indexes will hold all of these values as a list. Generally, a linked list is used to store the list of values at the index.
Linear probing in hashing
In linear probing, collision is resolved by checking the next slot.
h(k, i) = (h′(k) + i) mod m
where
i = {0, 1, ….}
h'(k) is a new hash function
If a collision occurs at h(k, 0), then h(k, 1) is checked. In this way, the value of i is incremented linearly.
Quadratic Probing in hashing
It works similarly to linear probing but the spacing between the slots is increased (greater than one) by using the following relation.
h(k, i) = (h′(k) + c1i + c2i2) mod m
where,
c1 and c2 are positive auxiliary constants,
i = {0, 1, ….}
Double Hashing in data structures
If a collision occurs after applying a hash function h(k), then another hash function is calculated for finding the next slot.
h(k, i) = (h1(k) + ih2(k)) mod m
But, using the methods to resolve the hashing is quite a tedious task as it also has a lot of problems associated with it. Like, if we are using linear probing and we are checking for the next index, it is possible that it is also filled. So, it is better that we use a hash function that reduces the problem of hashing. Such a function is called a good Hash Function.
A good Hash Function –
A good hash function may not prevent the collisions completely however it can reduce the number of collisions.
Hash table implementation in data structure
Now, we are going to learn the code for creating the hash table and using it –
#include <iostream> #include <list> using namespace std; class HashTable { int capacity; list<int> *table; public: HashTable(int V); void insertItem(int key, int data); void deleteItem(int key); int checkPrime(int n) { int i; if (n == 1 || n == 0) { return 0; } for (i = 2; i < n / 2; i++) { if (n % i == 0) { return 0; } } return 1; } int getPrime(int n) { if (n % 2 == 0) { n++; } while (!checkPrime(n)) { n += 2; } return n; } int hashFunction(int key) { return (key % capacity); } void displayHash(); }; HashTable::HashTable(int c) { int size = getPrime(c); this->capacity = size; table = new list<int>[capacity]; } void HashTable::insertItem(int key, int data) { int index = hashFunction(key); table[index].push_back(data); } void HashTable::deleteItem(int key) { int index = hashFunction(key); list<int>::iterator i; for (i = table[index].begin(); i != table[index].end(); i++) { if (*i == key) break; } if (i != table[index].end()) table[index].erase(i); } void HashTable::displayHash() { for (int i = 0; i < capacity; i++) { cout << "table[" << i << "]"; for (auto x : table[i]) cout << " --> " << x; cout << endl; } } int main() { int key[] = {112,234,123,245,678,211}; int data[] = {123, 432, 523, 43, 423, 111}; int size = sizeof(key) / sizeof(key[0]); HashTable h(size); for (int i = 0; i < size; i++) h.insertItem(key[i], data[i]); h.deleteItem(12); h.displayHash(); }
Applications of Hash Table –
- Message Digest.
- Password Verification.
- Data Structures(Programming Languages)
- Compiler Operation.
- Rabin-Karp Algorithm.
- Linking File name and path together.
- Game Boards.
- Graphics.
Try these quizzes now