Handling Nulls
Swipe to show menu
When working with real-world data, you will often encounter missing or null values. In Polars, these are represented as null rather than NaN or other placeholders. Handling missing values is essential for maintaining the integrity of your analysis.
Suppose you have a DataFrame with a steam_deck_status column, but some entries are missing. You can address these missing values in two main ways: filling them with a default value or dropping the rows entirely.
To fill missing values in the steam_deck_status column with the string "Unknown", use the fill_null method:
123456789101112import polars as pl df = pl.DataFrame({ "game": ["Portal", "Half-Life", "Aperture Desk Job", "Counter-Strike"], "steam_deck_status": ["Verified", None, "Playable", None] }) # Fill nulls with "Unknown" df_filled = df.with_columns( pl.col("steam_deck_status").fill_null("Unknown") ) print(df_filled)
If you prefer to remove any rows where steam_deck_status is missing, use the drop_nulls method. This will return a DataFrame containing only the rows where all columns (or a specified column) are not null:
123# Drop rows where steam_deck_status is null df_no_nulls = df.drop_nulls("steam_deck_status") print(df_no_nulls)
Polars is designed to handle missing data efficiently and explicitly. Unlike some libraries that treat missing values as a special floating point value (NaN), Polars uses null as a clear signal of missingness, regardless of data type. This approach avoids ambiguity and ensures that missing data is handled consistently across columns, whether they contain strings, numbers, or dates.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat