Summary  
This chapter covers handling missing values by explicitly substituting null entries with defaults or removing them, using fill_null and drop_nulls operations to control how nulls propagate in data.  

General domain of usage  
Data cleaning

This video demonstrates how to handle missing `steam_deck_status` values in a Polars DataFrame using `fill_null` and `drop_null`, and provides insight into Polars' philosophy for managing missing data efficiently.
-- video metadata (use this info when making a video) --

Cut all examples of code to a minimum. Show only the essential code. Show code no longer than 10 lines. Remove all possible pieces to demonstrate some code feature. You may show not full code that will not start, but still demonstrates some feature theoritacally.
If code is longer than 8 lines, move code output  blocks to second column.


When working with real-world data, you will often encounter missing or null values. In Polars, these are represented as `null` rather than `NaN` or other placeholders. Handling missing values is essential for maintaining the integrity of your analysis.

Suppose you have a DataFrame with a `steam_deck_status` column, but some entries are missing. You can address these missing values in two main ways: filling them with a default value or dropping the rows entirely.

To fill missing values in the `steam_deck_status` column with the string `"Unknown"`, use the `fill_null` method:

import polars as pl

df = pl.DataFrame({
    "game": ["Portal", "Half-Life", "Aperture Desk Job", "Counter-Strike"],
    "steam_deck_status": ["Verified", None, "Playable", None]
})

# Fill nulls with "Unknown"
df_filled = df.with_columns(
    pl.col("steam_deck_status").fill_null("Unknown")
)
print(df_filled)

If you prefer to remove any rows where `steam_deck_status` is missing, use the `drop_nulls` method. This will return a DataFrame containing only the rows where all columns (or a specified column) are not null:

# Drop rows where steam_deck_status is null
df_no_nulls = df.drop_nulls("steam_deck_status")
print(df_no_nulls)

Polars is designed to handle missing data efficiently and explicitly. Unlike some libraries that treat missing values as a special floating point value (`NaN`), Polars uses `null` as a clear signal of missingness, regardless of data type. This approach avoids ambiguity and ensures that missing data is handled consistently across columns, whether they contain **strings**, **numbers**, or **dates**.

`pl.col("steam_deck_status").fill_null("Unknown")`;

Nulls in `steam_deck_status` replaced by "Unknown";

Removes rows with nulls in specified columns;

Rows with nulls in `steam_deck_status` are gone;

Consistent and efficient missing data handling.

Which method would you use to replace missing values in the `steam_deck_status` column with `"Unknown"`?

Dive into the fundamentals of data wrangling with Polars, using real-world Steam game datasets. Learn Polars' columnar paradigm, selection, aggregation, joining, reshaping, and essential string, date, and missing data operations.

Explore the core differences between Polars and traditional row-based DataFrames, focusing on columnar operations, selection, and conditional logic using the Steam games dataset.

Gain a solid understanding of Polars' parallel group-by, aggregation, joining, and reshaping capabilities using the Steam games and spy insights datasets.

Explore the essentials of data cleaning: text normalization, date parsing, and robust handling of missing values in Polars.

Handling Nulls