Course Content
Optimization Techniques in Python
Optimization Techniques in Python
Lists and NumPy Arrays
Choosing the right data structure can significantly affect both speed and memory usage. Let's first explore lists and compare them with NumPy arrays to understand when and how to use these data structures effectively.
List
A list
is one of the most commonly used data types. It functions as a dynamic array, meaning its size can grow or shrink when needed. Lists are versatile, offering efficient access and modification at arbitrary indices. However, operations like inserting or removing elements, and searching for an element (checking membership), can become slow for large lists. The exception is insertion or removal at the end of the list, which remains efficient regardless of the list’s size.
It would be a good choice to use lists in the following scenarios:
- You need ordered data;
- You frequently access or modify elements by index;
- You need to store different data types (e.g., integers, strings, or custom objects);
- You don't require fast membership testing or fast insertion into or removal from the middle of the list.
my_list = [10, 20, 30] # Access an element by index print(my_list[1]) # Modify an element at a specific index my_list[1] = 50 print(my_list) # Insert an element at the end of the list my_list.append(40) print(my_list) # Remove an element from the end of the list my_list.pop() print(my_list)
NumPy Array
While lists are versatile, they are not the most efficient for large-scale numerical operations. This is where NumPy arrays come into play.
NumPy arrays are implemented in C, making them much faster than lists for numerical operations. One key factor is vectorization, which allows operations to be performed on entire arrays at once, without the need for explicit loops. This leads to significant performance gains, especially with large datasets.
Let's look at an example of squaring each element in a list (using a for
loop within a list comprehension) and a NumPy array (using vectorization):
import numpy as np import os os.system('wget https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator my_list = list(range(1, 100001)) arr = np.array(my_list) @timeit_decorator(number=100) def square_list(numbers_list): return [x ** 2 for x in numbers_list] @timeit_decorator(number=100) def square_array(numbers_array): return numbers_array ** 2 sqaures_list = square_list(my_list) squares_array = square_array(arr) if np.array_equal(squares_array, sqaures_list): print('The array is equal to the list')
As you can see, the performance advantage of NumPy arrays is quite evident.
When dealing with numerical data, NumPy arrays offer a memory advantage over lists. They store actual data in contiguous memory blocks, making them more efficient, especially for large datasets. Being homogeneous (same data type), NumPy arrays avoid the overhead of object references.
In contrast, lists are heterogeneous, storing references to objects in contiguous memory, with the actual objects stored elsewhere. This flexibility introduces additional memory overhead when working with numerical data.
To summarize, the following table compares lists with NumPy arrays:
Thanks for your feedback!