Course Content
Advanced Techniques in pandas
1. Get Familiar With Indexing and Selecting Data
2. Dealing With Conditions
Advanced Techniques in pandas
Get Familiar With the .groupby() Method
I am happy to see you in this section. Here, we will group our data to find out information on groups of rows. Examine the data set on delays:
Airline | Flight | AirportFrom | AirportTo | DayOfWeek | Time | Length | Delay | |
0 | CO | 269 | SFO | IAH | 3 | 15 | 205 | 1 |
1 | 1 | US | 1558 | PHX | CLT | 3 | 15 | 222 |
2 | AA | 2400 | LAX | DFW | 3 | 20 | 165 | 1 |
3 | AA | 2466 | SFO | DFW | 3 | 20 | 195 | 1 |
4 | AS | 108 | ANC | SEA | 3 | 30 | 202 | 0 |
Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:
Explanation:
data[['Flight', 'Delay']]
- columns you will work on, including the columns you will group.groupby('Flight')
- here,'Flight'
is an argument of the function.groupby()
. It is the name of the column by which you will group. So, in our case, if rows of the dataset have the same value in the column'Flight'
, they will relate to one group. Then, due to the function.count()
that counts the rows, our function will calculate the number of rows of the column'Delay'
that have the same value in the column'Flight'
..count()
- a function that deals with rows having the same group. In this case, it counts the number of rows having the same value in the column'Flight'
. This function is not the only one you can use. Now you will get familiar with all of them.
Function title | Implementation | Syntax |
.mean() | Finds the mean value of all values in a column that relate to one group | data.groupby('column_name').mean() |
.median() | Finds the median value of all values in a column that relate to one group | data.groupby('column_name').median() |
.sum() | Finds the sum of all values in a column that relate to one group | data.groupby('column_name').sum() |
.count() | Finds the amount of values in a column that relate to one group | data.groupby('column_name').count() |
.min() | Finds the minimum value of all values in a column that relate to one group | data.groupby('column_name').min() |
.max() | Finds the maximim value of all values in a column that relate to one group | data.groupby('column_name').max() |
Everything was clear?
Section 4. Chapter 1