Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Hash Table | Advanced Data Structures
course content

Зміст курсу

Algorithms and Data Structures Overview

Hash TableHash Table

A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.

Hashing is a process of converting input data into a fixed-size value, typically a numeric value, using a hash function. This value, known as the hash code or hash value, is used to index and retrieve data quickly in hash-based data structures such as hash tables.

The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.

Note

Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.

Basic operations time complexity

Hash Table Time Complexities
Operation Best Case Average Case Worst Case
Insertion O(1) O(1) O(n)
Deletion O(1) O(1) O(n)
Lookup O(1) O(1) O(n)

Collisions

But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.

The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.

In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N) time complexity for insert, search, and delete operations.

Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.

Open addressing

The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:

  • linear probing (when we increment an index by 1);
  • quadratic probing (when we increment an index by a degree of 2);
  • double hashing (when we generate a new hash value based on the previous one).

This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.

So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.

Which of the following is not a technique for resolving hash function collisions using open addressing?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 4
course content

Зміст курсу

Algorithms and Data Structures Overview

Hash TableHash Table

A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.

Hashing is a process of converting input data into a fixed-size value, typically a numeric value, using a hash function. This value, known as the hash code or hash value, is used to index and retrieve data quickly in hash-based data structures such as hash tables.

The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.

Note

Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.

Basic operations time complexity

Hash Table Time Complexities
Operation Best Case Average Case Worst Case
Insertion O(1) O(1) O(n)
Deletion O(1) O(1) O(n)
Lookup O(1) O(1) O(n)

Collisions

But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.

The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.

In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N) time complexity for insert, search, and delete operations.

Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.

Open addressing

The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:

  • linear probing (when we increment an index by 1);
  • quadratic probing (when we increment an index by a degree of 2);
  • double hashing (when we generate a new hash value based on the previous one).

This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.

So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.

Which of the following is not a technique for resolving hash function collisions using open addressing?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 4
some-alt