Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn group_by & Aggregations | Combining, Aggregating
Data Wrangling with Polars

group_by & Aggregations

Swipe to show menu

Grouping and aggregating data is a core part of data wrangling, especially when you want to summarize information by categories. With Polars, you can efficiently perform group-by operations and aggregate results in parallel, making it ideal for large datasets. Suppose you have a DataFrame called games_df with columns such as developer, price, positive_reviews, and negative_reviews. You might want to find the average price and total reviews for each developer. In Polars, this can be done using the group_by method, followed by aggregation functions like mean and sum.

Here's how you can group games_df by the developer column, calculate the average price, and sum the total reviews (combining both positive and negative reviews):

123456789101112131415161718192021
import polars as pl # Sample data games_df = pl.DataFrame({ "developer": ["DevA", "DevB", "DevA", "DevC", "DevB"], "price": [10.0, 20.0, 15.0, 30.0, 25.0], "positive_reviews": [100, 150, 200, 80, 120], "negative_reviews": [10, 20, 15, 5, 8] }) # Group by developer, calculate average price and total reviews result = ( games_df .group_by("developer") .agg([ pl.col("price").mean().alias("avg_price"), (pl.col("positive_reviews") + pl.col("negative_reviews")).sum().alias("total_reviews") ]) ) print(result)
question mark

Which Polars method allows you to group a DataFrame by a column and perform aggregations?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 1
some-alt