course content

Course Content

Advanced Techniques in pandas

Get Familiar With the .groupby() MethodGet Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find out information on groups of rows. Examine the data set on delays:

Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay
0 CO 269 SFO IAH 3 15 205 1
1 US 1558 PHX CLT 3 15 222 0
2 AA 2400 LAX DFW 3 20 165 1
3 AA 2466 SFO DFW 3 20 195 1
4 AS 108 ANC SEA 3 30 202 0

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

  • data[['Flight', 'Delay']] - columns you will work on, including the columns you will group.
  • groupby('Flight') - here, 'Flight' is an argument of the function .groupby(). It is the name of the column by which you will group. So, in our case, if rows of the dataset have the same value in the column 'Flight', they will relate to one group. Then, due to the function .count() that counts the rows, our function will calculate the number of rows of the column 'Delay' that have the same value in the column 'Flight'.
  • .count() - a function that deals with rows having the same group. In this case, it counts the number of rows having the same value in the column 'Flight'. This function is not the only one you can use. Now you will get familiar with all of them.
Function title Implementation Syntax
.mean() Finds the mean value of all values in a column that relate to one group data.groupby('column_name').mean()
.median() Finds the median value of all values in a column that relate to one group data.groupby('column_name').median()
.sum() Finds the sum of all values in a column that relate to one group data.groupby('column_name').sum()
.count() Finds the amount of values in a column that relate to one group data.groupby('column_name').count()
.min() Finds the minimum value of all values in a column that relate to one group data.groupby('column_name').min()
.max() Finds the maximum value of all values in a column that relate to one group data.groupby('column_name').max()

question-icon
Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

data_extracted = data[['', 'Time']]('').mean()
print(data_extracted)
DayOfWeekTime
3804.993130
4804.452984
5702.888362

Everything was clear?

Section 4. Chapter 1