Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Finding and Fixing Memory Leaks | Profiling and Leak Detection
Python Memory Management

Finding and Fixing Memory Leaks

Scorri per mostrare il menu

A memory leak in Python is not the same as in C – Python will not access freed memory. But it will hold objects alive longer than necessary, causing the process to grow until it exhausts available RAM. The most common causes are unbounded caches, forgotten references in global scope, and reference cycles involving __del__.

Pattern 1: Unbounded Cache Growth

A dictionary used as a cache with no eviction policy grows indefinitely:

123456789101112131415161718192021
import tracemalloc # Simulating an unbounded cache leak result_cache = {} def compute_report(report_id): if report_id not in result_cache: result_cache[report_id] = list(range(750)) # Expensive computation return result_cache[report_id] tracemalloc.start() snapshot_before = tracemalloc.take_snapshot() for report_id in range(2500): compute_report(report_id) # Cache grows without bound snapshot_after = tracemalloc.take_snapshot() stats = snapshot_after.compare_to(snapshot_before, "lineno") print(stats[0]) # Shows the cache line as the top allocator tracemalloc.stop()

Fix – use functools.lru_cache or limit the cache size manually:

123456
import functools # Bounded cache with automatic eviction @functools.lru_cache(maxsize=256) def compute_report(report_id): return list(range(1000))

Pattern 2: Globals Accumulating State

Module-level lists or dicts that append without clearing are a common source of growth:

12345678910111213
import objgraph event_log = [] # Global – never cleared def process_event(event_id): event_log.append({"id": event_id, "data": list(range(100))}) objgraph.show_growth() for event_id in range(1000): process_event(event_id) objgraph.show_growth() # Shows list and dict growth

Fix – use a bounded deque or clear periodically:

123456
from collections import deque event_log = deque(maxlen=500) # Automatically evicts old entries def process_event(event_id): event_log.append({"id": event_id, "data": list(range(100))})

Pattern 3: Reference Cycles with __del__

Before Python 3.4, objects with __del__ that formed cycles were not collected by the GC. In Python 3.4+, this is fixed – but cycles still delay collection and increase peak memory:

123456789101112131415161718192021
import gc class Pipeline: def __init__(self, pipeline_id): self.pipeline_id = pipeline_id self.next_stage = None def __del__(self): print(f"Pipeline {self.pipeline_id} deleted") stage_a = Pipeline("extract") stage_b = Pipeline("transform") stage_a.next_stage = stage_b stage_b.next_stage = stage_a # Cycle del stage_a del stage_b print("Before gc.collect()") gc.collect() # Forces collection of the cycle print("After gc.collect()")

Fix – break cycles explicitly or use weakref for back-references.

A Systematic Leak Investigation Workflow

Follow this sequence when investigating a memory leak:

  • Run objgraph.show_growth() periodically to identify which type is accumulating;
  • Use objgraph.by_type("TypeName") to get live instances and inspect their state;
  • Use tracemalloc snapshot comparison to find the line allocating most memory;
  • Check for unbounded caches, global accumulators, and reference cycles;
  • Fix with lru_cache, deque(maxlen=n), weakref, or explicit del.
question mark

Which built-in decorator provides an easy bounded cache with automatic eviction to prevent unbounded cache growth?

Seleziona la risposta corretta

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 4

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 3. Capitolo 4
some-alt