Get Familiar With the .groupby() MethodGet Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find out information on groups of rows. Examine the data set on delays:

AirlineFlightAirportFromAirportToDayOfWeekTimeLengthDelay
0CO269SFOIAH3152051
11US1558PHXCLT315222
2AA2400LAXDFW3201651
3AA2466SFODFW3201951
4AS108ANCSEA3302020

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

Explanation:

  • data[['Flight', 'Delay']] - columns you will work on, including the columns you will group.
  • groupby('Flight') - here, 'Flight' is an argument of the function .groupby(). It is the name of the column by which you will group. So, in our case, if rows of the dataset have the same value in the column 'Flight', they will relate to one group. Then, due to the function .count() that counts the rows, our function will calculate the number of rows of the column 'Delay' that have the same value in the column 'Flight'.
  • .count() - a function that deals with rows having the same group. In this case, it counts the number of rows having the same value in the column 'Flight'. This function is not the only one you can use. Now you will get familiar with all of them.
Function titleImplementationSyntax
.mean()Finds the mean value of all values in a column that relate to one groupdata.groupby('column_name').mean()
.median()Finds the median value of all values in a column that relate to one groupdata.groupby('column_name').median()
.sum()Finds the sum of all values in a column that relate to one groupdata.groupby('column_name').sum()
.count()Finds the amount of values in a column that relate to one groupdata.groupby('column_name').count()
.min()Finds the minimum value of all values in a column that relate to one groupdata.groupby('column_name').min()
.max()Finds the maximim value of all values in a column that relate to one groupdata.groupby('column_name').max()

question-icon
Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

data_extracted = data[['', 'Time']]('').mean()
print(data_extracted)
DayOfWeekTime
3804.993130
4804.452984
5702.888362
down-icon

Everything was clear?

Section 4. Chapter 1