Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Grouping by Several Columns | Aggregating Data
Advanced Techniques in pandas
course content

Contenido del Curso

Advanced Techniques in pandas

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data
2. Dealing With Conditions
3. Extracting Data
4. Aggregating Data
5. Preprocessing Data

bookGrouping by Several Columns

Let's add some information on the .groupby() method. You can group by several columns, but the order is crucial in this case. In the previous chapter, we grouped data by the flight number and counted the number of delays. We can make this task complicated by grouping not only by the 'Flight' column, but also by the column 'Airline'. Refresh the information on the dataset and then look at this simple example (the output contains only the first 10 rows):

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count() print(data_flights.head(10))
copy

Explanation:

  • data[['Flight', 'Delay', 'Airline']] - columns you will work with, including the columns by which you will group;
  • .groupby(['Flight', 'Airline']) - here, 'Flight' and 'Airline' are arguments of the function .groupby().

Pay attention; if you want to group by several columns, put them into the list - the order is crucial. So, in our case, if rows of the data set have the same value in the column 'Flight', they will relate to one group. Then inside those groups, the function finds other groups for rows with the same value in the column 'Airline'. Then, due to the method .count() that counts the rows, our function will calculate the number of rows in the column 'Delay' that have the same value in the column 'Airline' for each 'Flight' group.

Tarea
test

Swipe to show code editor

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the average time for the groups. Follow the algorithm to manage the task:

  1. Group data:
    • Extract the columns 'AirportFrom', 'DayOfWeek', and 'Time' from data (in this order);
    • Apply the .groupby() method to the previous columns;
    • Within the .groupby() method, put the columns 'AirportFrom' and 'DayOfWeek'; the order is crucial;
    • Calculate the mean value of the column 'Time'.
  2. Output the first 10 rows of the data_flights.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 4. Capítulo 2
toggle bottom row

bookGrouping by Several Columns

Let's add some information on the .groupby() method. You can group by several columns, but the order is crucial in this case. In the previous chapter, we grouped data by the flight number and counted the number of delays. We can make this task complicated by grouping not only by the 'Flight' column, but also by the column 'Airline'. Refresh the information on the dataset and then look at this simple example (the output contains only the first 10 rows):

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count() print(data_flights.head(10))
copy

Explanation:

  • data[['Flight', 'Delay', 'Airline']] - columns you will work with, including the columns by which you will group;
  • .groupby(['Flight', 'Airline']) - here, 'Flight' and 'Airline' are arguments of the function .groupby().

Pay attention; if you want to group by several columns, put them into the list - the order is crucial. So, in our case, if rows of the data set have the same value in the column 'Flight', they will relate to one group. Then inside those groups, the function finds other groups for rows with the same value in the column 'Airline'. Then, due to the method .count() that counts the rows, our function will calculate the number of rows in the column 'Delay' that have the same value in the column 'Airline' for each 'Flight' group.

Tarea
test

Swipe to show code editor

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the average time for the groups. Follow the algorithm to manage the task:

  1. Group data:
    • Extract the columns 'AirportFrom', 'DayOfWeek', and 'Time' from data (in this order);
    • Apply the .groupby() method to the previous columns;
    • Within the .groupby() method, put the columns 'AirportFrom' and 'DayOfWeek'; the order is crucial;
    • Calculate the mean value of the column 'Time'.
  2. Output the first 10 rows of the data_flights.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 4. Capítulo 2
toggle bottom row

bookGrouping by Several Columns

Let's add some information on the .groupby() method. You can group by several columns, but the order is crucial in this case. In the previous chapter, we grouped data by the flight number and counted the number of delays. We can make this task complicated by grouping not only by the 'Flight' column, but also by the column 'Airline'. Refresh the information on the dataset and then look at this simple example (the output contains only the first 10 rows):

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count() print(data_flights.head(10))
copy

Explanation:

  • data[['Flight', 'Delay', 'Airline']] - columns you will work with, including the columns by which you will group;
  • .groupby(['Flight', 'Airline']) - here, 'Flight' and 'Airline' are arguments of the function .groupby().

Pay attention; if you want to group by several columns, put them into the list - the order is crucial. So, in our case, if rows of the data set have the same value in the column 'Flight', they will relate to one group. Then inside those groups, the function finds other groups for rows with the same value in the column 'Airline'. Then, due to the method .count() that counts the rows, our function will calculate the number of rows in the column 'Delay' that have the same value in the column 'Airline' for each 'Flight' group.

Tarea
test

Swipe to show code editor

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the average time for the groups. Follow the algorithm to manage the task:

  1. Group data:
    • Extract the columns 'AirportFrom', 'DayOfWeek', and 'Time' from data (in this order);
    • Apply the .groupby() method to the previous columns;
    • Within the .groupby() method, put the columns 'AirportFrom' and 'DayOfWeek'; the order is crucial;
    • Calculate the mean value of the column 'Time'.
  2. Output the first 10 rows of the data_flights.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Let's add some information on the .groupby() method. You can group by several columns, but the order is crucial in this case. In the previous chapter, we grouped data by the flight number and counted the number of delays. We can make this task complicated by grouping not only by the 'Flight' column, but also by the column 'Airline'. Refresh the information on the dataset and then look at this simple example (the output contains only the first 10 rows):

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count() print(data_flights.head(10))
copy

Explanation:

  • data[['Flight', 'Delay', 'Airline']] - columns you will work with, including the columns by which you will group;
  • .groupby(['Flight', 'Airline']) - here, 'Flight' and 'Airline' are arguments of the function .groupby().

Pay attention; if you want to group by several columns, put them into the list - the order is crucial. So, in our case, if rows of the data set have the same value in the column 'Flight', they will relate to one group. Then inside those groups, the function finds other groups for rows with the same value in the column 'Airline'. Then, due to the method .count() that counts the rows, our function will calculate the number of rows in the column 'Delay' that have the same value in the column 'Airline' for each 'Flight' group.

Tarea
test

Swipe to show code editor

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the average time for the groups. Follow the algorithm to manage the task:

  1. Group data:
    • Extract the columns 'AirportFrom', 'DayOfWeek', and 'Time' from data (in this order);
    • Apply the .groupby() method to the previous columns;
    • Within the .groupby() method, put the columns 'AirportFrom' and 'DayOfWeek'; the order is crucial;
    • Calculate the mean value of the column 'Time'.
  2. Output the first 10 rows of the data_flights.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 4. Capítulo 2
Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
We're sorry to hear that something went wrong. What happened?
some-alt