Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Advanced Grouping | Aggregating Data
Data Wrangling with pandas
SectionΒ 4. ChapterΒ 4
single

single

bookAdvanced Grouping

Swipe to show menu

Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. The main advantage of this function is its flexibility: it allows us to apply multiple, different aggregations to multiple columns all at the same time, returning a clean summary table.

Look at the example below. We grouped flights by 'Airline'. Then, using .agg(), we counted the total number of flights (using the 'Delay' column) and simultaneously found the shortest and longest flight durations (using the 'Length' column). So convenient, isn't it?

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
copy

Explanation:

.agg({'Delay': 'count', 'Length': ['min', 'max']})
  • .agg(): Short for "aggregate." This method takes the grouped data and condenses it into summary statistics based on the rules you provide.
  • {}: We use a Python dictionary to map specific columns to specific operations. The keys are the column names you want to target, and the values are the functions you want to apply.
  • 'Delay': 'count': Tells pandas to apply the count function to the 'Delay' column for each group. Notice that we pass the string alias 'count' instead of writing count(). Pandas recognizes these standard string names for common statistics.
  • 'Length': ['min', 'max']: If you want to apply more than one function to a single column, you must put the function names inside a list []. Here, we calculate both the minimum and maximum values for the 'Length' column.

Because we applied multiple functions, the resulting DataFrame will automatically create hierarchical (MultiIndex) columns. You will see Length on the top level, with min and max neatly categorized underneath it.

Task

Swipe to start coding

Your task is to dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt