Summary  
This chapter explains how to use group-by operations with the `.agg()` method to apply multiple aggregation functions to one or more DataFrame columns at once, producing hierarchical summary tables.  

General domain of usage  
Data analysis

Watch this video for a clear, step-by-step walkthrough of advanced grouping and aggregation in pandas. You'll see how to use the .groupby() and .agg() methods to perform multiple, customized summary calculations at the same time. The video demonstrates grouping by one or more columns, applying several aggregation functions to different columns, and interpreting the resulting MultiIndex DataFrame. By the end, you'll understand how to efficiently generate detailed summary tables for real-world data analysis tasks using pandas.

`.groupby()` メソッドについてさらに理解を深めましょう。ご存知の通り、`.agg()` メソッドを使用できます。この関数の主な利点は柔軟性にあり、**複数かつ異なる**集約処理を**複数**の列に同時に適用でき、すっきりとしたサマリーテーブルを返します。

以下の例を見てください。フライトを `'Airline'` でグループ化しました。その後、`.agg()` を使って、（`'Delay'` 列を用いて）フライトの総数をカウントし、同時に（`'Length'` 列を用いて）最短および最長のフライト時間を求めています。とても便利ですね。

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0)
data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']})
print(data_flights.head(10))

**解説：**

```python
.agg({'Delay': 'count', 'Length': ['min', 'max']})
```
* `.agg()`: 「aggregate（集約）」の略。このメソッドはグループ化されたデータを、指定したルールに基づいて要約統計量にまとめます。
* `{}`: Pythonの辞書を使い、特定の列に特定の処理を割り当てます。キーが対象の列名、値が適用する関数です。
* `'Delay': 'count'`: 各グループの `'Delay'` 列にカウント関数を適用します。`'count'` と書かずに文字列エイリアス `count()` を渡す点に注意してください。pandasはこれらの標準的な統計関数名を認識します。
* `'Length': ['min', 'max']`: 1つの列に**複数**の関数を適用したい場合、関数名をリスト `[]` に入れます。ここでは `'Length'` 列の最小値と最大値を計算しています。

複数の関数を適用したため、結果のDataFrameでは自動的に階層的（MultiIndex）な列が作成されます。上位レベルに `Length` が表示され、その下に `min` と `max` がきれいに分類されます。

import unittest
import pandas as pd
import io
import sys


def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


def frames_equal(df1, df2):
    """ÐÐµÑÐµÐ²ÑÑÐºÐ° ÑÑÐ²Ð½Ð¾ÑÑÑ DataFrame Ð½ÐµÐ·Ð°Ð»ÐµÐ¶Ð½Ð¾ Ð²ÑÐ´ Ð½Ð°Ð·Ð² ÐºÐ¾Ð»Ð¾Ð½Ð¾Ðº Ñ MultiIndex."""
    try:
        df1_sorted = df1.sort_index()
        df2_sorted = df2.sort_index()
        return df1_sorted.equals(df2_sorted)
    except Exception:
        return False


class TestFlightAggregations(unittest.TestCase):
    def test_group_and_aggregate(self):
        """
        Check if the grouped DataFrame correctly aggregates:
        - mean and max of 'Time';
        - median of 'Length';
        grouped by ['AirportFrom', 'AirportTo'] (in that order).
        """
        import user_code

        # Reference dataset
        url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane"
        df = pd.read_csv(url, index_col=0)

        # Expected result
        ref = df.groupby(['AirportFrom', 'AirportTo']).agg({'Time': ['mean', 'max'], 'Length': 'median'})

        # User result
        assert hasattr(user_code, "data_flights"), "Variable 'data_flights' is missing."
        user_result = user_code.data_flights

        # Compare
        condition = isinstance(user_result, pd.DataFrame) and frames_equal(user_result, ref)
        _dynamic_test(
            self,
            condition,
            "The grouped DataFrame 'data_flights' correctly calculates mean, max (Time) and median (Length).",
            "The aggregation result is incorrect. Ensure you group by ['AirportFrom', 'AirportTo'] and calculate the correct aggregations."
        )


class TestOutput(unittest.TestCase):
    def test_output_head_10(self):
        """
        Check that the output of print(data_flights.head(10)) is not empty.
        """
        import user_code
        captured_output = io.StringIO()
        sys.stdout = captured_output
        print(user_code.data_flights.head(10))
        sys.stdout = sys.__stdout__

        output_text = captured_output.getvalue().strip()
        condition = len(output_text) > 0
        _dynamic_test(
            self,
            condition,
            "The first 10 rows of 'data_flights' are printed correctly using .head(10).",
            "The output is missing or incorrect. Use print(data_flights.head(10))."
        )


if __name__ == "__main__":
    unittest.main()


test_code.py

このコースは、将来のデータアナリストのために多くの有用な関数を含んでいます。さまざまなデータ抽出方法を学び、条件を設定することもできます。その後、データのグループ化手法に精通することができます。また、データの前処理方法も学びます。各セクションには独自のデータセットが用意されているため、コースは魅力的なものとなっています。

このセクションでは、タイトルやインデックスによって特定の列を出力する方法を学びます。また、インデックスによって行を選択する方法についても理解を深めます。

ここでは、特定の条件を持つデータを抽出する方法を学びます。また、それらを組み合わせたり、自分自身で条件を作成したりする方法も学びます。

このセクションでは、さまざまなデータ条件の設定に関する知識を深めます。データが定義された値のリストに含まれているか、または2つの値の間にあるかを確認する方法を学びます。また、最大値と最小値を見つける方法についても学びます。

このセクションはコースの中でも特に興味深い内容の一つです。ここでは、データをさまざまな方法でグループ化する方法を学びます。特定のデータグループに関する情報を見つけるために、データアナリストとして役立つスキルを身につけることができます。

このセクションはデータアナリストにとって最も重要なものの一つです。なぜなら、データに不適切な形式の欠損値が含まれている場合、作業が不可能になるためです。したがって、ここではそのような不適切な値への対処方法を学びます。

高度なグループ化

解答