Summary  
This chapter covers how to group tabular data by multiple columns using pandas’ groupby method with a list of keys, showing that the order of columns defines a hierarchical (multi-level) index and that you can apply aggregations (sum, mean, count, etc.) on those groups.

General domain of usage  
Flight delay analysis

Watch this video for a hands-on demonstration of grouping by several columns in pandas. You'll see how to group by both 'Flight' and 'Airline' to count delays, as well as how to group by 'AirportFrom' and 'DayOfWeek' to calculate the average flight time. Visual walkthroughs will help you understand the importance of column order and how aggregation works in multi-column groupings.

`.groupby()` メソッドに関する情報の追加。複数の列でグループ化することが可能ですが、この場合は順序が重要。前の章ではフライト番号でデータをグループ化し、遅延の回数をカウントしました。今回は `'Flight'` 列だけでなく、`'Airline'` 列でもグループ化することで、より複雑な処理を行います。データセットの情報を再確認し、次のシンプルな例を見てください（出力は最初の10行のみ表示）：

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0)
data_flights = data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count()
print(data_flights.head(10))

**解説：**

```python
data[['Flight', 'Delay', 'Airline']].groupby(['Flight', 'Airline']).count()
```
- `data[['Flight', 'Delay', 'Airline']]` - 作業対象の列。グループ化に使用する列も含む；
- `.groupby(['Flight', 'Airline'])` - ここで `'Flight'` と `'Airline'` は `.groupby()` 関数の引数。

複数の列でグループ化したい場合は、リストにして指定する必要がある点に注意。今回の場合、データセットの行が `'Flight'` 列で同じ値を持つ場合、それらは同じグループにまとめられる。そのグループ内でさらに `'Airline'` 列で同じ値を持つ行ごとにグループ分けされる。その後、`.count()` メソッドによって、各 `'Delay'` グループ内で `'Airline'` ごとに `'Flight'` 列の行数が計算される。

import unittest
import pandas as pd
import io
import sys


def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


def frames_equal_strict(df1, df2):
    """ÐÐµÑÐµÐ²ÑÑÐºÐ° ÑÑÐ²Ð½Ð¾ÑÑÑ DataFrame Ð· ÑÑÐ°ÑÑÐ²Ð°Ð½Ð½ÑÐ¼ Ð¿Ð¾ÑÑÐ´ÐºÑ ÐºÐ¾Ð»Ð¾Ð½Ð¾Ðº ÑÐ° ÑÐ½Ð´ÐµÐºÑÑ."""
    try:
        return df1.equals(df2)
    except Exception:
        return False


class TestGroupData(unittest.TestCase):
    def test_grouping_average(self):
        """
        1. Group data by 'AirportFrom' and 'DayOfWeek', calculate mean of 'Time'.
        """
        import user_code

        # ÐÑÐ¸Ð³ÑÐ½Ð°Ð»ÑÐ½Ð¸Ð¹ DataFrame
        df = pd.read_csv(
            "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane",
            index_col=0,
        )

        # ÐÑÐ°Ð»Ð¾Ð½Ð½Ðµ ÑÑÑÐµÐ½Ð½Ñ
        ref = df[["AirportFrom", "DayOfWeek", "Time"]].groupby(["AirportFrom", "DayOfWeek"]).mean()

        condition = (
            hasattr(user_code, "data_flights")
            and isinstance(user_code.data_flights, pd.DataFrame)
            and frames_equal_strict(user_code.data_flights, ref)
        )

        _dynamic_test(
            self,
            condition,
            "The data is correctly grouped by 'AirportFrom' and 'DayOfWeek' with mean 'Time' calculated.",
            "The grouping or averaging is incorrect. Check the columns order and groupby parameters."
        )


class TestOutput(unittest.TestCase):
    def test_output_head_10(self):
        """
        2. Output the first 10 rows of the grouped DataFrame using .head(10).
        """
        import user_code
        captured_output = io.StringIO()
        sys.stdout = captured_output
        print(user_code.data_flights.head(10))
        sys.stdout = sys.__stdout__

        output_text = captured_output.getvalue().strip()
        condition = len(output_text) > 0
        _dynamic_test(
            self,
            condition,
            "The first 10 rows of 'data_flights' are printed correctly using .head(10).",
            "The output is missing or incorrect. Use print(data_flights.head(10))."
        )


if __name__ == "__main__":
    unittest.main()


test_code.py

このコースは、将来のデータアナリストのために多くの有用な関数を含んでいます。さまざまなデータ抽出方法を学び、条件を設定することもできます。その後、データのグループ化手法に精通することができます。また、データの前処理方法も学びます。各セクションには独自のデータセットが用意されているため、コースは魅力的なものとなっています。

このセクションでは、タイトルやインデックスによって特定の列を出力する方法を学びます。また、インデックスによって行を選択する方法についても理解を深めます。

ここでは、特定の条件を持つデータを抽出する方法を学びます。また、それらを組み合わせたり、自分自身で条件を作成したりする方法も学びます。

このセクションでは、さまざまなデータ条件の設定に関する知識を深めます。データが定義された値のリストに含まれているか、または2つの値の間にあるかを確認する方法を学びます。また、最大値と最小値を見つける方法についても学びます。

このセクションはコースの中でも特に興味深い内容の一つです。ここでは、データをさまざまな方法でグループ化する方法を学びます。特定のデータグループに関する情報を見つけるために、データアナリストとして役立つスキルを身につけることができます。

このセクションはデータアナリストにとって最も重要なものの一つです。なぜなら、データに不適切な形式の欠損値が含まれている場合、作業が不可能になるためです。したがって、ここではそのような不適切な値への対処方法を学びます。

複数の列によるグループ化

解答