Contenido del Curso
Optimization Techniques in Python
Optimization Techniques in Python
Efficient String Operations
Efficient String Concatenation
When working with many strings, it’s essential to use the most efficient method for concatenation. Using the +
(+=
) operator repeatedly is inefficient for large datasets, as it creates a new string each time. Instead, using str.join()
is much faster and more memory-efficient.
Let's compare the performance of two approaches for concatenating strings with newline characters into a single string. The first uses a for
loop with the +=
operator, while the second leverages the more efficient str.join()
method.
import os decorators = os.system('wget https://content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator # Simulated lines of a report lines = [f"Line {i}" for i in range(1, 1000001)] # Inefficient concatenation @timeit_decorator(number=50) def concat_with_plus(): result = "" for line in lines: result += line + "\n" return result # Efficient concatenation @timeit_decorator(number=50) def concat_with_join(): return "\n".join(lines) + "\n" # Add final newline for consistency result_plus = concat_with_plus() result_join = concat_with_join() print(result_plus == result_join)
Precompiling Regular Expressions
When working with regular expressions in, performance can become a concern, especially when dealing with large datasets or repetitive pattern matching. In such cases, precompiling the pattern is a useful optimization technique.
Precompiling ensures that the regex engine doesn't recompile the pattern every time it's used, which can significantly improve performance when the same pattern is applied multiple times across a dataset. This approach is particularly beneficial in scenarios like filtering, validation, or searching in large text files.
Let's compare the performance of two approaches for validating usernames using regular expressions. The first approach uses the re.match
function with the pattern defined inline each time it's called. The second, more efficient approach, precompiles the regex pattern using re.compile
and reuses it for all validations.
import os import re decorators = os.system('wget https://content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator # Simulated usernames usernames = ["user123", "admin!@#", "test_user", "invalid!"] * 100000 # Naive approach @timeit_decorator(number=10) def validate_with_re(): pattern = r"^\w+$" return [bool(re.match(pattern, username)) for username in usernames] # Optimized approach @timeit_decorator(number=10) def validate_with_compiled_re(): compiled_pattern = re.compile(r"^\w+$") return [bool(compiled_pattern.match(username)) for username in usernames] result_without_precompiling = validate_with_re() result_with_precompiling = validate_with_compiled_re() print(result_without_precompiling == result_with_precompiling)
1. You are generating a report with 10000
lines, where each line represents a transaction summary. Which method is the most efficient for combining these lines into a single string with ;
between them?
2. Why is precompiling a regular expression using re.compile()
often faster than using re.match()
with an inline pattern?
¡Gracias por tus comentarios!