Generators vs Lists for Large Data
Swipe um das Menü anzuzeigen
A list stores all its elements in memory at once. A generator produces elements one at a time, on demand, holding only the current element in memory. For large datasets, this difference is the gap between a program that runs and one that crashes with MemoryError.
The Memory Cost of Lists
When you build a list comprehension, Python allocates memory for every element immediately:
1234567891011121314import sys import tracemalloc tracemalloc.start() # Building a full list of 1 million records in memory records_list = [{"id": record_id, "value": record_id * 2.5} for record_id in range(100000)] snapshot = tracemalloc.take_snapshot() total = sum(s.size for s in snapshot.statistics("lineno")) print(f"List memory: {total / 1024 / 1024:.1f} MB") del records_list tracemalloc.stop()
Generators Use Near-Zero Memory
A generator expression has the same syntax as a list comprehension but with parentheses instead of brackets. It stores no elements – it yields them one at a time:
123456789101112131415import sys import tracemalloc tracemalloc.start() # Generator holds no elements – just the recipe to produce them records_generator = ({"id": record_id, "value": record_id * 2.5} for record_id in range(1000000)) snapshot = tracemalloc.take_snapshot() total = sum(s.size for s in snapshot.statistics("lineno")) print(f"Generator memory: {total / 1024 / 1024:.1f} MB") # Near zero print(sys.getsizeof(records_generator)) # ~200 bytes regardless of range size tracemalloc.stop()
The generator object itself is tiny – it holds only the iterator state, not the data.
Processing Large Files with Generators
Generators are the standard tool for processing files that don't fit in memory:
123456789101112131415# Reading and processing a large CSV-like dataset line by line def parse_transactions(filename): with open(filename, "r") as file: next(file) # Skipping the header line for line in file: parts = line.strip().split(",") yield {"id": parts[0], "amount": float(parts[1]), "currency": parts[2]} # Processing without loading the full file into memory def total_revenue(filename): total = 0.0 for transaction in parse_transactions(filename): if transaction["currency"] == "USD": total += transaction["amount"] return total
Only one line is in memory at any time, regardless of file size.
Generator Functions vs Generator Expressions
12345678# Generator expression – inline, for simple transformations amounts = (row["amount"] for row in parse_transactions("transactions.csv")) # Generator function – for multi-step logic with yield def high_value_transactions(filename, threshold): for transaction in parse_transactions(filename): if transaction["amount"] > threshold: yield transaction
When to Use Each
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen