Get Familiar With the .groupby() Method
I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):
Airline | Flight | AirportFrom | AirportTo | DayOfWeek | Time | Length | Delay | |
---|---|---|---|---|---|---|---|---|
0 | CO | 269 | SFO | IAH | 3 | 15 | 205 | 1 |
1 | US | 1558 | PHX | CLT | 3 | 15 | 222 | 0 |
2 | AA | 2400 | LAX | DFW | 3 | 20 | 165 | 1 |
3 | AA | 2466 | SFO | DFW | 3 | 20 | 195 | 1 |
4 | AS | 108 | ANC | SEA | 3 | 30 | 202 | 0 |
Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:
Explanation:
data[['Flight', 'Delay']]
- These are the columns you will work on, including the columns you will group;groupby('Flight')
- The'Flight'
column is the argument for the.groupby()
function. This means that rows with the same value in the'Flight'
column will be grouped together;.sum()
- This function operates on rows within each group created by.groupby()
. In this case, it sums the values in the'Delay'
column for rows that belong to the same'Flight'
group.
Note
Since the
'Delay'
column contains only0
(no delay occurred) or1
(a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.
In fact, .sum()
is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.
Function title | Implementation | Syntax |
---|---|---|
.mean() | Finds the mean value of all values in a column that relate to one group | data.groupby('column_name').mean() |
.median() | Finds the median value of all values in a column that relate to one group | data.groupby('column_name').median() |
.sum() | Finds the sum of all values in a column that relate to one group | data.groupby('column_name').sum() |
.count() | Finds the amount of values in a column that relate to one group | data.groupby('column_name').count() |
.min() | Finds the minimum value of all values in a column that relate to one group | data.groupby('column_name').min() |
.max() | Finds the maximum value of all values in a column that relate to one group | data.groupby('column_name').max() |
Tudo estava claro?
Conteúdo do Curso
Advanced Techniques in pandas
Advanced Techniques in pandas
1. Get Familiar With Indexing and Selecting Data
Get Familiar With the .groupby() Method
I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):
Airline | Flight | AirportFrom | AirportTo | DayOfWeek | Time | Length | Delay | |
---|---|---|---|---|---|---|---|---|
0 | CO | 269 | SFO | IAH | 3 | 15 | 205 | 1 |
1 | US | 1558 | PHX | CLT | 3 | 15 | 222 | 0 |
2 | AA | 2400 | LAX | DFW | 3 | 20 | 165 | 1 |
3 | AA | 2466 | SFO | DFW | 3 | 20 | 195 | 1 |
4 | AS | 108 | ANC | SEA | 3 | 30 | 202 | 0 |
Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:
Explanation:
data[['Flight', 'Delay']]
- These are the columns you will work on, including the columns you will group;groupby('Flight')
- The'Flight'
column is the argument for the.groupby()
function. This means that rows with the same value in the'Flight'
column will be grouped together;.sum()
- This function operates on rows within each group created by.groupby()
. In this case, it sums the values in the'Delay'
column for rows that belong to the same'Flight'
group.
Note
Since the
'Delay'
column contains only0
(no delay occurred) or1
(a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.
In fact, .sum()
is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.
Function title | Implementation | Syntax |
---|---|---|
.mean() | Finds the mean value of all values in a column that relate to one group | data.groupby('column_name').mean() |
.median() | Finds the median value of all values in a column that relate to one group | data.groupby('column_name').median() |
.sum() | Finds the sum of all values in a column that relate to one group | data.groupby('column_name').sum() |
.count() | Finds the amount of values in a column that relate to one group | data.groupby('column_name').count() |
.min() | Finds the minimum value of all values in a column that relate to one group | data.groupby('column_name').min() |
.max() | Finds the maximum value of all values in a column that relate to one group | data.groupby('column_name').max() |
Tudo estava claro?