Finding and Fixing Memory Leaks
Glissez pour afficher le menu
A memory leak in Python is not the same as in C – Python will not access freed memory. But it will hold objects alive longer than necessary, causing the process to grow until it exhausts available RAM. The most common causes are unbounded caches, forgotten references in global scope, and reference cycles involving __del__.
Pattern 1: Unbounded Cache Growth
A dictionary used as a cache with no eviction policy grows indefinitely:
123456789101112131415161718192021import tracemalloc # Simulating an unbounded cache leak result_cache = {} def compute_report(report_id): if report_id not in result_cache: result_cache[report_id] = list(range(750)) # Expensive computation return result_cache[report_id] tracemalloc.start() snapshot_before = tracemalloc.take_snapshot() for report_id in range(2500): compute_report(report_id) # Cache grows without bound snapshot_after = tracemalloc.take_snapshot() stats = snapshot_after.compare_to(snapshot_before, "lineno") print(stats[0]) # Shows the cache line as the top allocator tracemalloc.stop()
Fix – use functools.lru_cache or limit the cache size manually:
123456import functools # Bounded cache with automatic eviction @functools.lru_cache(maxsize=256) def compute_report(report_id): return list(range(1000))
Pattern 2: Globals Accumulating State
Module-level lists or dicts that append without clearing are a common source of growth:
12345678910111213import objgraph event_log = [] # Global – never cleared def process_event(event_id): event_log.append({"id": event_id, "data": list(range(100))}) objgraph.show_growth() for event_id in range(1000): process_event(event_id) objgraph.show_growth() # Shows list and dict growth
Fix – use a bounded deque or clear periodically:
123456from collections import deque event_log = deque(maxlen=500) # Automatically evicts old entries def process_event(event_id): event_log.append({"id": event_id, "data": list(range(100))})
Pattern 3: Reference Cycles with __del__
Before Python 3.4, objects with __del__ that formed cycles were not collected by the GC. In Python 3.4+, this is fixed – but cycles still delay collection and increase peak memory:
123456789101112131415161718192021import gc class Pipeline: def __init__(self, pipeline_id): self.pipeline_id = pipeline_id self.next_stage = None def __del__(self): print(f"Pipeline {self.pipeline_id} deleted") stage_a = Pipeline("extract") stage_b = Pipeline("transform") stage_a.next_stage = stage_b stage_b.next_stage = stage_a # Cycle del stage_a del stage_b print("Before gc.collect()") gc.collect() # Forces collection of the cycle print("After gc.collect()")
Fix – break cycles explicitly or use weakref for back-references.
A Systematic Leak Investigation Workflow
Follow this sequence when investigating a memory leak:
- Run
objgraph.show_growth()periodically to identify which type is accumulating; - Use
objgraph.by_type("TypeName")to get live instances and inspect their state; - Use
tracemallocsnapshot comparison to find the line allocating most memory; - Check for unbounded caches, global accumulators, and reference cycles;
- Fix with
lru_cache,deque(maxlen=n),weakref, or explicitdel.
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion