Course Content

# Ultimate NumPy

3. Commonly used NumPy Functions

Ultimate NumPy

## Handling Missing Values

In real-world data analysis, it's common to encounter datasets with **missing** or **incomplete** information. Missing values can significantly impact the quality and reliability of your analysis, so it’s important to identify such values and deal with them.

### Identifying Missing Values

Before addressing missing data, it's essential to recognize how missing values are represented in **NumPy** arrays. Typically, **NumPy** uses the special value `numpy.nan`

to denote missing or undefined data.

To identify and locate missing values within a **NumPy** array, you can use the `numpy.isnan()`

function:

As you can see, this function takes an array as its argument and returns a **boolean array** with `True`

identifying a `numpy.nan`

value.

### Dealing with Missing Values

Once you've identified missing values, you can choose from several strategies to handle them. We will discuss two of the most common ones:

**1. Removing Missing Values**

You can remove rows or columns containing missing values by applying **boolean indexing**:

Note

The tilde (

`~`

) symbol is used as a bitwise NOT operator, basically making`True`

values`False`

and vice versa, so all non-missing values will be marked as`True`

in the boolean array.

**2. Filling Missing Values**

You can replace missing values with specific values. **Mean** or **median** of the non-missing values are most commonly used for this purpose. They can be calculated using the `numpy.nanmean()`

and `numpy.nanmedian()`

functions respectively:

As you can see, we use **boolean indexing** to replace every missing value with the calculated value.

When dealing with **higher dimensional** arrays, both of these functions calculate their respective statistics for **all** non-missing values in a **flattened** array by default (`axis=None`

). You can set the `axis`

parameter yourself to calculate the statistics along the specified axis:

In case you want to explore all parameters of these functions, you can refer to their documentation: numpy.nanmean, numpy.nanmedian.

# Task

`temperature_data`

is a 2D of daily temperatures in two cities for three days. Your task is the following:

- Use the correct function to calculate the mean of non-missing values for every city.
- Specify the second keyword argument correctly to calculate the the mean for each row separately.
- Replace missing values with
`average_temperatures`

using boolean indexing.

Everything was clear?