Scorri per mostrare il menu

A memory leak in Python is not the same as in C – Python will not access freed memory. But it will hold objects alive longer than necessary, causing the process to grow until it exhausts available RAM. The most common causes are unbounded caches, forgotten references in global scope, and reference cycles involving __del__.

Pattern 1: Unbounded Cache Growth

A dictionary used as a cache with no eviction policy grows indefinitely:


              123456789101112131415161718192021
            
import tracemalloc

# Simulating an unbounded cache leak
result_cache = {}

def compute_report(report_id):
    if report_id not in result_cache:
        result_cache[report_id] = list(range(750))  # Expensive computation
    return result_cache[report_id]

tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()

for report_id in range(2500):
    compute_report(report_id)  # Cache grows without bound

snapshot_after = tracemalloc.take_snapshot()
stats = snapshot_after.compare_to(snapshot_before, "lineno")
print(stats[0])  # Shows the cache line as the top allocator

tracemalloc.stop()

Fix – use functools.lru_cache or limit the cache size manually:


              123456
            
import functools

# Bounded cache with automatic eviction
@functools.lru_cache(maxsize=256)
def compute_report(report_id):
    return list(range(1000))

Pattern 2: Globals Accumulating State

Module-level lists or dicts that append without clearing are a common source of growth:


              12345678910111213
            
import objgraph

event_log = []  # Global – never cleared

def process_event(event_id):
    event_log.append({"id": event_id, "data": list(range(100))})

objgraph.show_growth()

for event_id in range(1000):
    process_event(event_id)

objgraph.show_growth()  # Shows list and dict growth

Fix – use a bounded deque or clear periodically:


              123456
            
from collections import deque

event_log = deque(maxlen=500)  # Automatically evicts old entries

def process_event(event_id):
    event_log.append({"id": event_id, "data": list(range(100))})

Pattern 3: Reference Cycles with `del`

Before Python 3.4, objects with __del__ that formed cycles were not collected by the GC. In Python 3.4+, this is fixed – but cycles still delay collection and increase peak memory:


              123456789101112131415161718192021
            
import gc

class Pipeline:
    def __init__(self, pipeline_id):
        self.pipeline_id = pipeline_id
        self.next_stage = None

    def __del__(self):
        print(f"Pipeline {self.pipeline_id} deleted")

stage_a = Pipeline("extract")
stage_b = Pipeline("transform")
stage_a.next_stage = stage_b
stage_b.next_stage = stage_a  # Cycle

del stage_a
del stage_b

print("Before gc.collect()")
gc.collect()  # Forces collection of the cycle
print("After gc.collect()")

Fix – break cycles explicitly or use weakref for back-references.

A Systematic Leak Investigation Workflow

Follow this sequence when investigating a memory leak:

Run objgraph.show_growth() periodically to identify which type is accumulating;
Use objgraph.by_type("TypeName") to get live instances and inspect their state;
Use tracemalloc snapshot comparison to find the line allocating most memory;
Check for unbounded caches, global accumulators, and reference cycles;
Fix with lru_cache, deque(maxlen=n), weakref, or explicit del.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 4

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Finding and Fixing Memory Leaks

Pattern 1: Unbounded Cache Growth

A dictionary used as a cache with no eviction policy grows indefinitely:


              123456789101112131415161718192021
            
import tracemalloc

# Simulating an unbounded cache leak
result_cache = {}

def compute_report(report_id):
    if report_id not in result_cache:
        result_cache[report_id] = list(range(750))  # Expensive computation
    return result_cache[report_id]

tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()

for report_id in range(2500):
    compute_report(report_id)  # Cache grows without bound

snapshot_after = tracemalloc.take_snapshot()
stats = snapshot_after.compare_to(snapshot_before, "lineno")
print(stats[0])  # Shows the cache line as the top allocator

tracemalloc.stop()

Fix – use functools.lru_cache or limit the cache size manually:


              123456
            
import functools

# Bounded cache with automatic eviction
@functools.lru_cache(maxsize=256)
def compute_report(report_id):
    return list(range(1000))

Pattern 2: Globals Accumulating State

Module-level lists or dicts that append without clearing are a common source of growth:


              12345678910111213
            
import objgraph

event_log = []  # Global – never cleared

def process_event(event_id):
    event_log.append({"id": event_id, "data": list(range(100))})

objgraph.show_growth()

for event_id in range(1000):
    process_event(event_id)

objgraph.show_growth()  # Shows list and dict growth

Fix – use a bounded deque or clear periodically:


              123456
            
from collections import deque

event_log = deque(maxlen=500)  # Automatically evicts old entries

def process_event(event_id):
    event_log.append({"id": event_id, "data": list(range(100))})

Pattern 3: Reference Cycles with `del`

Before Python 3.4, objects with __del__ that formed cycles were not collected by the GC. In Python 3.4+, this is fixed – but cycles still delay collection and increase peak memory:


              123456789101112131415161718192021
            
import gc

class Pipeline:
    def __init__(self, pipeline_id):
        self.pipeline_id = pipeline_id
        self.next_stage = None

    def __del__(self):
        print(f"Pipeline {self.pipeline_id} deleted")

stage_a = Pipeline("extract")
stage_b = Pipeline("transform")
stage_a.next_stage = stage_b
stage_b.next_stage = stage_a  # Cycle

del stage_a
del stage_b

print("Before gc.collect()")
gc.collect()  # Forces collection of the cycle
print("After gc.collect()")

Fix – break cycles explicitly or use weakref for back-references.

A Systematic Leak Investigation Workflow

Follow this sequence when investigating a memory leak:

Run objgraph.show_growth() periodically to identify which type is accumulating;
Use objgraph.by_type("TypeName") to get live instances and inspect their state;
Use tracemalloc snapshot comparison to find the line allocating most memory;
Check for unbounded caches, global accumulators, and reference cycles;
Fix with lru_cache, deque(maxlen=n), weakref, or explicit del.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 4

Finding and Fixing Memory Leaks

Pattern 1: Unbounded Cache Growth

Pattern 2: Globals Accumulating State

Pattern 3: Reference Cycles with __del__

A Systematic Leak Investigation Workflow

Finding and Fixing Memory Leaks

Pattern 1: Unbounded Cache Growth

Pattern 2: Globals Accumulating State

Pattern 3: Reference Cycles with __del__

A Systematic Leak Investigation Workflow

Pattern 3: Reference Cycles with `del`

Pattern 3: Reference Cycles with `del`