Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Hash Table | Advanced Data Structures
Algorithms and Data Structures Overview
course content

Course Content

Algorithms and Data Structures Overview

Algorithms and Data Structures Overview

1. Introduction to ADS
2. List and Array
3. Advanced Data Structures
4. Graphs

bookHash Table

A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.

The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.

Note

Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.

Basic operations time complexity

Collisions

But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.

The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.

12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np class HashTable: def __init__(self, size): # Initialize the hash table with a specified size self.size = size # Create a dictionary where each key corresponds to an index in the hash table # and each value is an empty numpy array self.table = {i: np.array([]) for i in range(size)} def hash_function(self, key): # Simple modular hashing function to determine the index for a given key return key % self.size def insert(self, key, value): # Compute the index using the hash function index = self.hash_function(key) # Check if the array at the computed index is empty if len(self.table[index]) == 0: # If empty, initialize the array with a tuple containing the key-value pair self.table[index] = np.array([(key, value)]) else: # If not empty, append the key-value pair to the existing array self.table[index] = np.append(self.table[index], np.array([(key, value)])) print(f"The element with value {value} inserted into a hash table at index {index}") # Define key-value pairs pairs = [ {'Phone': 12345, 'Name': 'Will'}, {'Phone': 13492, 'Name': 'Sam'}, {'Phone': 23770, 'Name': 'Mike'}, {'Phone': 12345, 'Name': 'Jack'} ] # Initialize the hash table hash_table = HashTable(size=7) # Insert key-value pairs into the hash table for pair in pairs: hash_table.insert(pair['Phone'], pair['Name'])
copy

In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N) time complexity for insert, search, and delete operations.

Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.

Open addressing

The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:

  • linear probing (when we increment an index by 1);
  • quadratic probing (when we increment an index by a degree of 2);
  • double hashing (when we generate a new hash value based on the previous one).
1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np class HashTable: def __init__(self, size): # Define an empty dictionary to represent the hash table self.table = {} # Initialize the hash table with 'None' values for each index for i in range(size): self.table[i] = None def hash_function(self, key): # Simple modular hash function return key % len(self.table) def insert(self, key, value): index = self.hash_function(key) # Check if the current index is available if self.table[index] is None: self.table[index] = value else: # Perform linear probing until an available index is found while self.table[index] is not None: print(f'Index {index} is already occupied. Trying index {index + 1}') index += 1 self.table[index] = value print(f'The element with value {value} inserted into a hash table at index {index}') # Define key-value pairs pairs = [ {'Phone': 12345, 'Name': 'Will'}, {'Phone': 13492, 'Name': 'Sam'}, {'Phone': 23770, 'Name': 'Mike'}, {'Phone': 12345, 'Name': 'Jack'} ] # Initialize the hash table hash_table = HashTable(size=7) # Insert key-value pairs into the hash table for pair in pairs: hash_table.insert(pair['Phone'], pair['Name'])
copy

This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.

So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.

Which of the following is not a technique for resolving hash function collisions using open addressing?

Which of the following is not a technique for resolving hash function collisions using open addressing?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 4
some-alt