Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Get Familiar With the .groupby() Method | Aggregating Data
Advanced Techniques in pandas

Get Familiar With the .groupby() MethodGet Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay
0 CO 269 SFO IAH 3 15 205 1
1 US 1558 PHX CLT 3 15 222 0
2 AA 2400 LAX DFW 3 20 165 1
3 AA 2466 SFO DFW 3 20 195 1
4 AS 108 ANC SEA 3 30 202 0

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

  • data[['Flight', 'Delay']] - These are the columns you will work on, including the columns you will group;
  • groupby('Flight') - The 'Flight' column is the argument for the .groupby() function. This means that rows with the same value in the 'Flight' column will be grouped together;
  • .sum() - This function operates on rows within each group created by .groupby(). In this case, it sums the values in the 'Delay' column for rows that belong to the same 'Flight' group.

Note

Since the 'Delay' column contains only 0 (no delay occurred) or 1 (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, .sum() is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

Function title Implementation Syntax
.mean() Finds the mean value of all values in a column that relate to one group data.groupby('column_name').mean()
.median() Finds the median value of all values in a column that relate to one group data.groupby('column_name').median()
.sum() Finds the sum of all values in a column that relate to one group data.groupby('column_name').sum()
.count() Finds the amount of values in a column that relate to one group data.groupby('column_name').count()
.min() Finds the minimum value of all values in a column that relate to one group data.groupby('column_name').min()
.max() Finds the maximum value of all values in a column that relate to one group data.groupby('column_name').max()

question-icon
Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

data_extracted = data[['', 'Time']]('').mean()
print(data_extracted)
DayOfWeekTime
3804.993130
4804.452984
5702.888362

Everything was clear?

Section 4. Chapter 1
course content

Course Content

Advanced Techniques in pandas

Get Familiar With the .groupby() MethodGet Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay
0 CO 269 SFO IAH 3 15 205 1
1 US 1558 PHX CLT 3 15 222 0
2 AA 2400 LAX DFW 3 20 165 1
3 AA 2466 SFO DFW 3 20 195 1
4 AS 108 ANC SEA 3 30 202 0

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

  • data[['Flight', 'Delay']] - These are the columns you will work on, including the columns you will group;
  • groupby('Flight') - The 'Flight' column is the argument for the .groupby() function. This means that rows with the same value in the 'Flight' column will be grouped together;
  • .sum() - This function operates on rows within each group created by .groupby(). In this case, it sums the values in the 'Delay' column for rows that belong to the same 'Flight' group.

Note

Since the 'Delay' column contains only 0 (no delay occurred) or 1 (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, .sum() is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

Function title Implementation Syntax
.mean() Finds the mean value of all values in a column that relate to one group data.groupby('column_name').mean()
.median() Finds the median value of all values in a column that relate to one group data.groupby('column_name').median()
.sum() Finds the sum of all values in a column that relate to one group data.groupby('column_name').sum()
.count() Finds the amount of values in a column that relate to one group data.groupby('column_name').count()
.min() Finds the minimum value of all values in a column that relate to one group data.groupby('column_name').min()
.max() Finds the maximum value of all values in a column that relate to one group data.groupby('column_name').max()

question-icon
Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

data_extracted = data[['', 'Time']]('').mean()
print(data_extracted)
DayOfWeekTime
3804.993130
4804.452984
5702.888362

Everything was clear?

Section 4. Chapter 1
some-alt