Learn Lists and NumPy Arrays | Efficient Use of Data Structures

Choosing the right data structure can significantly affect both speed and memory usage. Let's first explore lists and compare them with NumPy arrays to understand when and how to use these data structures effectively.

List

A list is one of the most commonly used data types. It functions as a dynamic array, meaning its size can grow or shrink when needed. Lists are versatile, offering efficient access and modification at arbitrary indices. However, operations like inserting or removing elements, and searching for an element (checking membership), can become slow for large lists. The exception is insertion or removal at the end of the list, which remains efficient regardless of the list’s size.

It would be a good choice to use lists in the following scenarios:

You need ordered data;
You frequently access or modify elements by index;
You need to store different data types (e.g., integers, strings, or custom objects);
You don't require fast membership testing or fast insertion into or removal from the middle of the list.


              123456789101112131415
            
my_list = [10, 20, 30]
# Access an element by index
print(my_list[1])

# Modify an element at a specific index
my_list[1] = 50
print(my_list)

# Insert an element at the end of the list
my_list.append(40)
print(my_list)

# Remove an element from the end of the list
my_list.pop()
print(my_list)

NumPy Array

While lists are versatile, they are not the most efficient for large-scale numerical operations. This is where NumPy arrays come into play.

NumPy arrays are implemented in C, making them much faster than lists for numerical operations. One key factor is vectorization, which allows operations to be performed on entire arrays at once, without the need for explicit loops. This leads to significant performance gains, especially with large datasets.

Let's look at an example of squaring each element in a list (using a for loop within a list comprehension) and a NumPy array (using vectorization):


              1234567891011121314151617181920
            
import numpy as np
import os
os.system('wget https://content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null')
from decorators import timeit_decorator

my_list = list(range(1, 100001))
arr = np.array(my_list)

@timeit_decorator(number=100)
def square_list(numbers_list):
    return [x ** 2 for x in numbers_list]

@timeit_decorator(number=100)
def square_array(numbers_array):
    return numbers_array ** 2

sqaures_list = square_list(my_list)
squares_array = square_array(arr)
if np.array_equal(squares_array, sqaures_list):
  print('The array is equal to the list')

As you can see, the performance advantage of NumPy arrays is quite evident.

When dealing with numerical data, NumPy arrays offer a memory advantage over lists. They store actual data in contiguous memory blocks, making them more efficient, especially for large datasets. Being homogeneous (same data type), NumPy arrays avoid the overhead of object references.

In contrast, lists are heterogeneous, storing references to objects in contiguous memory, with the actual objects stored elsewhere. This flexibility introduces additional memory overhead when working with numerical data.

To summarize, the following table compares lists with NumPy arrays:

1. You are developing a program to manage a collection of `Sensor` objects (custom class), each containing a `timestamp` (string) and a `reading` (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?

2. You are working with a large numerical dataset for a machine learning project. Which data structure would provide the most efficient performance for this task?

3. You are analyzing stock market data, which consists of numerical values (prices) over time. You need to perform fast calculations, such as finding the average price and applying mathematical transformations on the data. Which data structure would you choose?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

List

It would be a good choice to use lists in the following scenarios:

You need ordered data;
You frequently access or modify elements by index;
You need to store different data types (e.g., integers, strings, or custom objects);
You don't require fast membership testing or fast insertion into or removal from the middle of the list.


              123456789101112131415
            
my_list = [10, 20, 30]
# Access an element by index
print(my_list[1])

# Modify an element at a specific index
my_list[1] = 50
print(my_list)

# Insert an element at the end of the list
my_list.append(40)
print(my_list)

# Remove an element from the end of the list
my_list.pop()
print(my_list)

NumPy Array

While lists are versatile, they are not the most efficient for large-scale numerical operations. This is where NumPy arrays come into play.

Let's look at an example of squaring each element in a list (using a for loop within a list comprehension) and a NumPy array (using vectorization):


              1234567891011121314151617181920
            
import numpy as np
import os
os.system('wget https://content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null')
from decorators import timeit_decorator

my_list = list(range(1, 100001))
arr = np.array(my_list)

@timeit_decorator(number=100)
def square_list(numbers_list):
    return [x ** 2 for x in numbers_list]

@timeit_decorator(number=100)
def square_array(numbers_array):
    return numbers_array ** 2

sqaures_list = square_list(my_list)
squares_array = square_array(arr)
if np.array_equal(squares_array, sqaures_list):
  print('The array is equal to the list')

As you can see, the performance advantage of NumPy arrays is quite evident.

To summarize, the following table compares lists with NumPy arrays:

1. You are developing a program to manage a collection of `Sensor` objects (custom class), each containing a `timestamp` (string) and a `reading` (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?

2. You are working with a large numerical dataset for a machine learning project. Which data structure would provide the most efficient performance for this task?

3. You are analyzing stock market data, which consists of numerical values (prices) over time. You need to perform fast calculations, such as finding the average price and applying mathematical transformations on the data. Which data structure would you choose?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1

Lists and NumPy Arrays

List

NumPy Array

1. You are developing a program to manage a collection of Sensor objects (custom class), each containing a timestamp (string) and a reading (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?

2. You are working with a large numerical dataset for a machine learning project. Which data structure would provide the most efficient performance for this task?

3. You are analyzing stock market data, which consists of numerical values (prices) over time. You need to perform fast calculations, such as finding the average price and applying mathematical transformations on the data. Which data structure would you choose?

Lists and NumPy Arrays

List

NumPy Array

1. You are developing a program to manage a collection of Sensor objects (custom class), each containing a timestamp (string) and a reading (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?

2. You are working with a large numerical dataset for a machine learning project. Which data structure would provide the most efficient performance for this task?

3. You are analyzing stock market data, which consists of numerical values (prices) over time. You need to perform fast calculations, such as finding the average price and applying mathematical transformations on the data. Which data structure would you choose?

1. You are developing a program to manage a collection of `Sensor` objects (custom class), each containing a `timestamp` (string) and a `reading` (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?

1. You are developing a program to manage a collection of `Sensor` objects (custom class), each containing a `timestamp` (string) and a `reading` (float). The dataset will grow over time, and frequent updates to individual sensor readings are required. Which data structure would be the best choice?