Apprendre LSM Trees and SSTables | Indexing and Search Structures

Glissez pour afficher le menu

When you deal with massive, write-heavy workloads — such as ingesting logs, metrics, or user activity streams—traditional storage structures like B-trees can become a bottleneck. Each write often triggers multiple random disk operations, leading to a phenomenon called write amplification: the process of writing the same data multiple times due to index maintenance and page splits. Write amplification not only slows down ingestion but also shortens the lifespan of storage hardware. To address these issues, Log-Structured Merge Trees (LSM trees) were designed with a write-optimized approach. Instead of updating disk structures immediately for every insert or update, LSM trees accumulate changes in memory, then periodically flush and merge them to disk in large, sequential operations. This strategy drastically reduces random writes and improves throughput for big data systems.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
            
import bisect

class SSTable:
    def __init__(self, data):
        # Data is a sorted list of (key, value) tuples
        self.data = sorted(data)

    def lookup(self, key):
        # Binary search for fast lookup
        keys = [k for k, _ in self.data]
        idx = bisect.bisect_left(keys, key)
        if idx < len(self.data) and self.data[idx][0] == key:
            return self.data[idx][1]
        return None

class LSMTree:
    def __init__(self, memtable_limit=4):
        self.memtable = []
        self.sstables = []
        self.memtable_limit = memtable_limit

    def insert(self, key, value):
        # Insert in sorted order in memtable
        bisect.insort(self.memtable, (key, value))
        if len(self.memtable) >= self.memtable_limit:
            self.flush_memtable()

    def flush_memtable(self):
        # Move sorted memtable to a new SSTable (simulating a disk write)
        print(f"Flushing memtable to SSTable: {self.memtable}")
        self.sstables.insert(0, SSTable(self.memtable))
        self.memtable = []
        self.compact()

    def compact(self):
        # Simple compaction: merge all SSTables into one if more than 2 exist
        if len(self.sstables) > 2:
            print("Compacting SSTables...")
            merged = {}
            for sstable in reversed(self.sstables):
                for k, v in sstable.data:
                    merged[k] = v
            merged_items = sorted(merged.items())
            self.sstables = [SSTable(merged_items)]
            print(f"Compacted SSTable: {merged_items}")

    def lookup(self, key):
        # Check memtable first
        idx = bisect.bisect_left([k for k, _ in self.memtable], key)
        if idx < len(self.memtable) and self.memtable[idx][0] == key:
            return self.memtable[idx][1]
        # Check SSTables in order (newest first)
        for sstable in self.sstables:
            value = sstable.lookup(key)
            if value is not None:
                return value
        return None

# Example usage
lsm = LSMTree(memtable_limit=3)
lsm.insert("apple", 1)
lsm.insert("banana", 2)
lsm.insert("cherry", 3)  # Triggers flush
lsm.insert("date", 4)
lsm.insert("elderberry", 5)
lsm.insert("fig", 6)     # Triggers flush and compaction

print("Lookup 'banana':", lsm.lookup("banana"))
print("Lookup 'fig':", lsm.lookup("fig"))
print("Lookup 'grape':", lsm.lookup("grape"))

The backbone of LSM tree persistence is the Sorted String Table (SSTable). An SSTable is an immutable, sorted file on disk that stores key-value pairs. Because SSTables are sorted, merging them during compaction is efficient, and lookups can use binary search for speed. When new data is flushed from the in-memory buffer (memtable), it becomes a new SSTable. Over time, background processes merge and compact multiple SSTables, discarding obsolete entries and keeping only the latest value for each key. This design enables high write throughput, efficient use of disk, and fast lookups, especially when combined with auxiliary structures like Bloom filters.

Study More

LSM trees are the foundation of several modern storage engines, including Cassandra, LevelDB, and RocksDB. Each system adapts the core LSM tree principles to fit its own use case, optimizing for factors like latency, throughput, and space efficiency. Exploring their documentation or source code can deepen your understanding of how LSM trees are implemented in real-world databases.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 2

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 2. Chapitre 2