Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Complicated Grouping | Aggregating Data
Advanced Techniques in pandas

bookComplicated Grouping

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

.apply(lambda x: x['Length'].max()/60)
  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.
Task

Swipe to start coding

Your task here is to analyze flight durations considering airport, airline, and weekday. You will group the data to determine the minimum total flight time (the sum of 'Length' and 'Time') for each unique combination of departure airport, airline, and weekday.

Follow the algorithm step by step:

  1. Store the list of columns 'AirportFrom', 'Airline', 'DayOfWeek', 'Time', and 'Length' (in this order) in the variable columns.
  2. Extract these columns from data using bracket notation (data[columns]).
  3. Group the dataset by 'AirportFrom', 'Airline', and 'DayOfWeek' (in this exact order).
  4. Inside the .groupby() method, apply the .apply() function to calculate the sum of the 'Length' and 'Time' columns for each group, then find the minimum of this sum.
  5. Assign the result to a variable named data_flights.
  6. Output the first 10 rows of the resulting Series using .head(10).

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 3.03

bookComplicated Grouping

Swipe to show menu

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

.apply(lambda x: x['Length'].max()/60)
  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.
Task

Swipe to start coding

Your task here is to analyze flight durations considering airport, airline, and weekday. You will group the data to determine the minimum total flight time (the sum of 'Length' and 'Time') for each unique combination of departure airport, airline, and weekday.

Follow the algorithm step by step:

  1. Store the list of columns 'AirportFrom', 'Airline', 'DayOfWeek', 'Time', and 'Length' (in this order) in the variable columns.
  2. Extract these columns from data using bracket notation (data[columns]).
  3. Group the dataset by 'AirportFrom', 'Airline', and 'DayOfWeek' (in this exact order).
  4. Inside the .groupby() method, apply the .apply() function to calculate the sum of the 'Length' and 'Time' columns for each group, then find the minimum of this sum.
  5. Assign the result to a variable named data_flights.
  6. Output the first 10 rows of the resulting Series using .head(10).

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
single

single

some-alt