Aprende Hash Tables and Key-Value Lookup | Core Algorithms for Data Processing

Desliza para mostrar el menú

Hash tables are vital for fast key-value lookup, making them a cornerstone of modern databases and caching systems. By mapping keys to unique indices using a hash function, hash tables allow you to quickly insert, search, and delete data. This direct access is what makes them so powerful for applications that demand high-speed retrieval, such as caching web sessions or indexing database records. When you store or retrieve a value, the hash function computes an index from the key, and the value is placed or found at that position in an underlying array. If two keys hash to the same index — a collision — strategies like chaining or open addressing are used to resolve conflicts, ensuring every key-value pair can be efficiently accessed. This structure underpins high-performance systems where rapid data access is crucial.


              1234567891011121314151617181920212223242526272829303132333435363738
            
class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Using chaining for collision resolution

    def _hash(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)  # Update existing key
                return
        self.table[idx].append((key, value))  # Insert new key-value pair

    def search(self, key):
        idx = self._hash(key)
        for k, v in self.table[idx]:
            if k == key:
                return v
        return None  # Key not found

    def delete(self, key):
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                del self.table[idx][i]
                return True
        return False  # Key not found

# Example usage:
ht = HashTable()
ht.insert("user1", "session1")
ht.insert("user2", "session2")
print(ht.search("user1"))  # Output: session1
ht.delete("user1")
print(ht.search("user1"))  # Output: None

Hash table performance is one of their main advantages. On average, insertion, search, and deletion operations take constant time — O(1) — because the hash function computes the index directly. However, in the worst-case scenario, such as when many keys collide and end up in the same bucket, operations can degrade to O(n), where n is the number of entries. This is rare in practice for well-designed hash functions and good load factors. Real-world systems must also handle resizing: as the table becomes full, it is often doubled in size and all entries are rehashed to maintain efficiency. Clustering, where many keys end up in the same region of the table, can also affect performance, so careful hash function selection and load factor management are key to maintaining speed.

Study More

Distributed systems like Redis and DynamoDB rely on hash tables to scale efficiently. They use advanced techniques like partitioning and replication to distribute key-value pairs across many nodes, providing both speed and resilience for large-scale applications.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 3

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 3