Small Integer Cache and String Interning
Veeg om het menu te tonen
CPython pre-allocates certain objects and reuses them instead of creating new ones. Two of the most impactful optimizations are the small integer cache and string interning. Understanding them prevents subtle bugs and explains surprising is comparison results.
The Small Integer Cache
CPython pre-allocates integer objects for values in the range -5 to 256 at interpreter startup. Any time your code uses one of these values, it gets a reference to the cached object – no new allocation happens.
12345678# Demonstrating the small integer cache transaction_a = 100 transaction_b = 100 print(transaction_a is transaction_b) # True – same cached object large_a = 1000 large_b = 1000 print(large_a is large_b) # False – two separate objects
is checks object identity (same memory address), not equality. Outside the cached range, two variables holding the same integer value point to different objects.
1234567891011import sys # Verifying that small integers share identity cached_value = 42 another_ref = 42 print(sys.getrefcount(cached_value)) # High – many things reference 42 large_value = 1000 another_large = 1000 print(large_value is another_large) # False – different objects print(large_value == another_large) # True – same value
Always use == to compare values. Use is only to check identity (e.g., is None, is True, is False).
String Interning
CPython automatically interns string literals that look like valid Python identifiers – strings containing only letters, digits, and underscores. Interned strings share a single object in memory.
12345678910# Automatic interning of identifier-like strings department_a = "engineering" department_b = "engineering" print(department_a is department_b) # True – interned automatically # Strings with spaces are not automatically interned label_a = "Q1 Revenue" label_b = "Q1 Revenue" print(label_a is label_b) # False – not interned (contains space) print(label_a == label_b) # True – values are equal
Manual Interning with sys.intern()
You can force interning of any string using sys.intern(). This is useful when the same string is repeated thousands of times – for example, column names in a large dataset:
1234567891011121314import sys # Interning repeated column names to save memory column_names_raw = ["revenue", "cost", "revenue", "profit", "cost", "revenue"] # Without interning – potentially multiple objects per unique string without_intern = column_names_raw # With interning – guaranteed single object per unique string with_intern = [sys.intern(name) for name in column_names_raw] # Verifying identity after interning print(with_intern[0] is with_intern[2]) # True – same interned object print(with_intern[1] is with_intern[4]) # True – same interned object
In a dataset with millions of rows and a small set of repeated string values, interning can reduce memory usage significantly.
Integer Cache vs String Interning
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.