Зміст курсу
Algorithms and Data Structures Overview
Algorithms and Data Structures Overview
Hash Table
A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Java_OOP/Definition_1.png)
The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.
Note
Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.
Basic operations time complexity
Operation | Best Case | Average Case | Worst Case |
---|---|---|---|
Insertion | O(1) | O(1) | O(n) |
Deletion | O(1) | O(1) | O(n) |
Lookup | O(1) | O(1) | O(n) |
Collisions
But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.
The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.
In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N)
time complexity for insert, search, and delete operations.
Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash4.png)
Open addressing
The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:
- linear probing (when we increment an index by 1);
- quadratic probing (when we increment an index by a degree of 2);
- double hashing (when we generate a new hash value based on the previous one).
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash3.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash1!.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash2!.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash3!.png)
This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.
So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.
Все було зрозуміло?
Зміст курсу
Algorithms and Data Structures Overview
Algorithms and Data Structures Overview
Hash Table
A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Java_OOP/Definition_1.png)
The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.
Note
Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.
Basic operations time complexity
Operation | Best Case | Average Case | Worst Case |
---|---|---|---|
Insertion | O(1) | O(1) | O(n) |
Deletion | O(1) | O(1) | O(n) |
Lookup | O(1) | O(1) | O(n) |
Collisions
But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.
The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.
In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N)
time complexity for insert, search, and delete operations.
Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash4.png)
Open addressing
The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:
- linear probing (when we increment an index by 1);
- quadratic probing (when we increment an index by a degree of 2);
- double hashing (when we generate a new hash value based on the previous one).
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash3.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash1!.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash2!.png)
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/212d3d3e-af15-4df9-bb13-5cbbb8114954/hash3!.png)
This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.
So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.
Все було зрозуміло?