Course Content
Algorithms and Data Structures Overview
Algorithms and Data Structures Overview
Hash Table
A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.
The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.
Note
Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.
Basic operations time complexity
Collisions
But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.
The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.
import numpy as np class HashTable: def __init__(self, size): # Initialize the hash table with a specified size self.size = size # Create a dictionary where each key corresponds to an index in the hash table # and each value is an empty numpy array self.table = {i: np.array([]) for i in range(size)} def hash_function(self, key): # Simple modular hashing function to determine the index for a given key return key % self.size def insert(self, key, value): # Compute the index using the hash function index = self.hash_function(key) # Check if the array at the computed index is empty if len(self.table[index]) == 0: # If empty, initialize the array with a tuple containing the key-value pair self.table[index] = np.array([(key, value)]) else: # If not empty, append the key-value pair to the existing array self.table[index] = np.append(self.table[index], np.array([(key, value)])) print(f"The element with value {value} inserted into a hash table at index {index}") # Define key-value pairs pairs = [ {'Phone': 12345, 'Name': 'Will'}, {'Phone': 13492, 'Name': 'Sam'}, {'Phone': 23770, 'Name': 'Mike'}, {'Phone': 12345, 'Name': 'Jack'} ] # Initialize the hash table hash_table = HashTable(size=7) # Insert key-value pairs into the hash table for pair in pairs: hash_table.insert(pair['Phone'], pair['Name'])
In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N)
time complexity for insert, search, and delete operations.
Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.
Open addressing
The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:
- linear probing (when we increment an index by 1);
- quadratic probing (when we increment an index by a degree of 2);
- double hashing (when we generate a new hash value based on the previous one).
import numpy as np class HashTable: def __init__(self, size): # Define an empty dictionary to represent the hash table self.table = {} # Initialize the hash table with 'None' values for each index for i in range(size): self.table[i] = None def hash_function(self, key): # Simple modular hash function return key % len(self.table) def insert(self, key, value): index = self.hash_function(key) # Check if the current index is available if self.table[index] is None: self.table[index] = value else: # Perform linear probing until an available index is found while self.table[index] is not None: print(f'Index {index} is already occupied. Trying index {index + 1}') index += 1 self.table[index] = value print(f'The element with value {value} inserted into a hash table at index {index}') # Define key-value pairs pairs = [ {'Phone': 12345, 'Name': 'Will'}, {'Phone': 13492, 'Name': 'Sam'}, {'Phone': 23770, 'Name': 'Mike'}, {'Phone': 12345, 'Name': 'Jack'} ] # Initialize the hash table hash_table = HashTable(size=7) # Insert key-value pairs into the hash table for pair in pairs: hash_table.insert(pair['Phone'], pair['Name'])
This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.
So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.
Thanks for your feedback!