Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Generators vs Lists for Large Data | Writing Memory-Efficient Code
Python Memory Management

Generators vs Lists for Large Data

Свайпніть щоб показати меню

A list stores all its elements in memory at once. A generator produces elements one at a time, on demand, holding only the current element in memory. For large datasets, this difference is the gap between a program that runs and one that crashes with MemoryError.

The Memory Cost of Lists

When you build a list comprehension, Python allocates memory for every element immediately:

1234567891011121314
import sys import tracemalloc tracemalloc.start() # Building a full list of 1 million records in memory records_list = [{"id": record_id, "value": record_id * 2.5} for record_id in range(100000)] snapshot = tracemalloc.take_snapshot() total = sum(s.size for s in snapshot.statistics("lineno")) print(f"List memory: {total / 1024 / 1024:.1f} MB") del records_list tracemalloc.stop()

Generators Use Near-Zero Memory

A generator expression has the same syntax as a list comprehension but with parentheses instead of brackets. It stores no elements – it yields them one at a time:

123456789101112131415
import sys import tracemalloc tracemalloc.start() # Generator holds no elements – just the recipe to produce them records_generator = ({"id": record_id, "value": record_id * 2.5} for record_id in range(1000000)) snapshot = tracemalloc.take_snapshot() total = sum(s.size for s in snapshot.statistics("lineno")) print(f"Generator memory: {total / 1024 / 1024:.1f} MB") # Near zero print(sys.getsizeof(records_generator)) # ~200 bytes regardless of range size tracemalloc.stop()

The generator object itself is tiny – it holds only the iterator state, not the data.

Processing Large Files with Generators

Generators are the standard tool for processing files that don't fit in memory:

123456789101112131415
# Reading and processing a large CSV-like dataset line by line def parse_transactions(filename): with open(filename, "r") as file: next(file) # Skipping the header line for line in file: parts = line.strip().split(",") yield {"id": parts[0], "amount": float(parts[1]), "currency": parts[2]} # Processing without loading the full file into memory def total_revenue(filename): total = 0.0 for transaction in parse_transactions(filename): if transaction["currency"] == "USD": total += transaction["amount"] return total

Only one line is in memory at any time, regardless of file size.

Generator Functions vs Generator Expressions

12345678
# Generator expression – inline, for simple transformations amounts = (row["amount"] for row in parse_transactions("transactions.csv")) # Generator function – for multi-step logic with yield def high_value_transactions(filename, threshold): for transaction in parse_transactions(filename): if transaction["amount"] > threshold: yield transaction

When to Use Each

question mark

What is the key memory advantage of a generator over a list?

Виберіть правильну відповідь

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 2

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 2. Розділ 2
some-alt