Filtering & Conditional Logic
Swipe to show menu
Filtering data is a core part of data wrangling, especially when you want to focus on a specific subset of your dataset. In Polars, you can use Boolean masks to select only the rows that match your condition. Suppose you have a DataFrame called games_df with a price column. To filter for games where the price is greater than 20, you can use the following approach:
1234567891011import polars as pl # Example DataFrame games_df = pl.DataFrame({ "name": ["Chess", "Monopoly", "Scrabble", "Catan", "Pandemic"], "price": [10, 25, 15, 35, 22] }) # Filter games with price > 20 filtered_df = games_df.filter(pl.col("price") > 20) print(filtered_df)
In this example, only the games with a price above 20 are included in filtered_df.
You can also use conditional logic to create new columns based on the values of existing columns. The pl.when().then().otherwise() construct allows you to categorize data efficiently. For instance, you might want to classify each game into a price tier: "Budget" for games priced at 15 or less, "Standard" for prices between 16 and 30, and "Premium" for prices above 30. Here is how you can add a price_tier column to your DataFrame:
12345678910games_with_tier = games_df.with_columns( pl.when(pl.col("price") <= 15) .then("Budget") .when((pl.col("price") > 15) & (pl.col("price") <= 30)) .then("Standard") .otherwise("Premium") .alias("price_tier") ) print(games_with_tier)
This approach assigns each game to a tier based on its price, making it easy to segment your dataset for further analysis or visualization.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat