Summary  
This chapter covers how to group tabular data by multiple columns using pandas’ groupby method with a list of keys, showing that the order of columns defines a hierarchical (multi-level) index and that you can apply aggregations (sum, mean, count, etc.) on those groups.

General domain of usage  
Flight delay analysis

Watch this video for a hands-on demonstration of grouping by several columns in pandas. You'll see how to group by both 'Flight' and 'Airline' to count delays, as well as how to group by 'AirportFrom' and 'DayOfWeek' to calculate the average flight time. Visual walkthroughs will help you understand the importance of column order and how aggregation works in multi-column groupings.

Let's add some information on the `.groupby()` method. You can group by several columns, but the order is crucial in this case. In the previous chapter, we grouped data by the flight number and counted the number of delays. We can make this task complicated by grouping not only by the `'Flight'` column, but also by the column `'Airline'`. Refresh the information on the dataset and then look at this simple example (the output contains only the first 10 rows):

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0)
data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count()
print(data_flights.head(10))

**Explanation:**

```python
data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count()
```
- `data[['Flight', 'Delay', 'Airline']]` - columns you will work with, including the columns by which you will group;
- `.groupby(['Flight', 'Airline'])` -  here, `'Flight'` and `'Airline'` are arguments of the function `.groupby()`.  

Pay attention; if you want to group by several columns, put them into the list - the order is crucial. So, in our case, if rows of the data set have the same value in the column `'Flight'`, they will relate to one group. Then inside those groups, the function finds other groups for rows with the same value in the column `'Airline'`. Then, due to the method `.count()` that counts the rows, our function will calculate the number of rows in the column `'Delay'` that have the same value in the column `'Airline'` for each `'Flight'` group.

import unittest
import pandas as pd
import io
import sys


def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


def frames_equal_strict(df1, df2):
    """ÐÐµÑÐµÐ²ÑÑÐºÐ° ÑÑÐ²Ð½Ð¾ÑÑÑ DataFrame Ð· ÑÑÐ°ÑÑÐ²Ð°Ð½Ð½ÑÐ¼ Ð¿Ð¾ÑÑÐ´ÐºÑ ÐºÐ¾Ð»Ð¾Ð½Ð¾Ðº ÑÐ° ÑÐ½Ð´ÐµÐºÑÑ."""
    try:
        return df1.equals(df2)
    except Exception:
        return False


class TestGroupData(unittest.TestCase):
    def test_grouping_average(self):
        """
        1. Group data by 'AirportFrom' and 'DayOfWeek', calculate mean of 'Time'.
        """
        import user_code

        # ÐÑÐ¸Ð³ÑÐ½Ð°Ð»ÑÐ½Ð¸Ð¹ DataFrame
        df = pd.read_csv(
            "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane",
            index_col=0,
        )

        # ÐÑÐ°Ð»Ð¾Ð½Ð½Ðµ ÑÑÑÐµÐ½Ð½Ñ
        ref = df[["AirportFrom", "DayOfWeek", "Time"]].groupby(["AirportFrom", "DayOfWeek"]).mean()

        condition = (
            hasattr(user_code, "data_flights")
            and isinstance(user_code.data_flights, pd.DataFrame)
            and frames_equal_strict(user_code.data_flights, ref)
        )

        _dynamic_test(
            self,
            condition,
            "The data is correctly grouped by 'AirportFrom' and 'DayOfWeek' with mean 'Time' calculated.",
            "The grouping or averaging is incorrect. Check the columns order and groupby parameters."
        )


class TestOutput(unittest.TestCase):
    def test_output_head_10(self):
        """
        2. Output the first 10 rows of the grouped DataFrame using .head(10).
        """
        import user_code
        captured_output = io.StringIO()
        sys.stdout = captured_output
        print(user_code.data_flights.head(10))
        sys.stdout = sys.__stdout__

        output_text = captured_output.getvalue().strip()
        condition = len(output_text) > 0
        _dynamic_test(
            self,
            condition,
            "The first 10 rows of 'data_flights' are printed correctly using .head(10).",
            "The output is missing or incorrect. Use print(data_flights.head(10))."
        )


if __name__ == "__main__":
    unittest.main()


test_code.py

This course contains a lot of useful functions for a future data analyst. You will learn different ways of extracting data and even set conditions on it. After it, you will be familiar with the methods of grouping data. Also, you will learn how to preprocess data. Each section has its data set so that the course will be gripping.

This section will teach you how to output specific columns by their titles or indices. Also, you will get acquainted with the ways you can select rows  by indices.

Here, you will learn how to extract data that has specific conditions. Also, you will learn how to combine them and even create your own.

In this section, you will expand your knowledge on setting different data conditions. You will learn to check if your data is in a defined list of values or between two values. You will also learn how to find the largest and smallest values.

This section is one of the most fascinating of the course. Here, you will learn how to group data in different ways. It will help you work as a data analyst to find out information on specific data groups.

This section is one of the most significant for a data analyst because if the data contains missing data values in the incorrect format, it will be impossible to work with. Thus, you will learn how to deal with such inappropriate values here. 

Grouping by Several Columns

Solution