single
Advanced Grouping
Swipe to show menu
Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. The main advantage of this function is its flexibility: it allows us to apply multiple, different aggregations to multiple columns all at the same time, returning a clean summary table.
Look at the example below. We grouped flights by 'Airline'. Then, using .agg(), we counted the total number of flights (using the 'Delay' column) and simultaneously found the shortest and longest flight durations (using the 'Length' column). So convenient, isn't it?
1234import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
Explanation:
.agg({'Delay': 'count', 'Length': ['min', 'max']})
.agg(): Short for "aggregate." This method takes the grouped data and condenses it into summary statistics based on the rules you provide.{}: We use a Python dictionary to map specific columns to specific operations. The keys are the column names you want to target, and the values are the functions you want to apply.'Delay': 'count': Tells pandas to apply the count function to the'Delay'column for each group. Notice that we pass the string alias'count'instead of writingcount(). Pandas recognizes these standard string names for common statistics.'Length': ['min', 'max']: If you want to apply more than one function to a single column, you must put the function names inside a list[]. Here, we calculate both the minimum and maximum values for the'Length'column.
Because we applied multiple functions, the resulting DataFrame will automatically create hierarchical (MultiIndex) columns. You will see Length on the top level, with min and max neatly categorized underneath it.
Swipe to start coding
Your task is to dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:
Group data:
- Apply the
.groupby()method to the datasetdata; - Within the
.groupby()method, put the columns'AirportFrom'and'AirportTo'; the order is crucial; - Using the
.agg()method, calculate the aggregated values: the average and maximum value in the column'Time', and the median value of the column'Length'.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat