Get Familiar With the .groupby() Method | Aggregating Data

## Get Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay
0 CO 269 SFO IAH 3 15 205 1
1 US 1558 PHX CLT 3 15 222 0
2 AA 2400 LAX DFW 3 20 165 1
3 AA 2466 SFO DFW 3 20 195 1
4 AS 108 ANC SEA 3 30 202 0

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

• `data[['Flight', 'Delay']]` - These are the columns you will work on, including the columns you will group;
• `groupby('Flight')` - The `'Flight'` column is the argument for the `.groupby()` function. This means that rows with the same value in the `'Flight'` column will be grouped together;
• `.sum()` - This function operates on rows within each group created by `.groupby()`. In this case, it sums the values in the `'Delay'` column for rows that belong to the same `'Flight'` group.

Note

Since the `'Delay'` column contains only `0` (no delay occurred) or `1` (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, `.sum()` is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

Function title Implementation Syntax
.mean() Finds the mean value of all values in a column that relate to one group `data.groupby('column_name').mean()`
.median() Finds the median value of all values in a column that relate to one group `data.groupby('column_name').median()`
.sum() Finds the sum of all values in a column that relate to one group `data.groupby('column_name').sum()`
.count() Finds the amount of values in a column that relate to one group `data.groupby('column_name').count()`
.min() Finds the minimum value of all values in a column that relate to one group `data.groupby('column_name').min()`
.max() Finds the maximum value of all values in a column that relate to one group `data.groupby('column_name').max()`

# Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

`data_extracted = data[['``', 'Time']]``('``').mean()print(data_extracted)`
DayOfWeekTime
3804.993130
4804.452984
5702.888362

Everything was clear?

Section 4. Chapter 1

Course Content

## Get Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay
0 CO 269 SFO IAH 3 15 205 1
1 US 1558 PHX CLT 3 15 222 0
2 AA 2400 LAX DFW 3 20 165 1
3 AA 2466 SFO DFW 3 20 195 1
4 AS 108 ANC SEA 3 30 202 0

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

• `data[['Flight', 'Delay']]` - These are the columns you will work on, including the columns you will group;
• `groupby('Flight')` - The `'Flight'` column is the argument for the `.groupby()` function. This means that rows with the same value in the `'Flight'` column will be grouped together;
• `.sum()` - This function operates on rows within each group created by `.groupby()`. In this case, it sums the values in the `'Delay'` column for rows that belong to the same `'Flight'` group.

Note

Since the `'Delay'` column contains only `0` (no delay occurred) or `1` (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, `.sum()` is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

Function title Implementation Syntax
.mean() Finds the mean value of all values in a column that relate to one group `data.groupby('column_name').mean()`
.median() Finds the median value of all values in a column that relate to one group `data.groupby('column_name').median()`
.sum() Finds the sum of all values in a column that relate to one group `data.groupby('column_name').sum()`
.count() Finds the amount of values in a column that relate to one group `data.groupby('column_name').count()`
.min() Finds the minimum value of all values in a column that relate to one group `data.groupby('column_name').min()`
.max() Finds the maximum value of all values in a column that relate to one group `data.groupby('column_name').max()`

# Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

`data_extracted = data[['``', 'Time']]``('``').mean()print(data_extracted)`
DayOfWeekTime
3804.993130
4804.452984
5702.888362

Everything was clear?

Section 4. Chapter 1