Summary  
This chapter covers advanced DataFrame grouping techniques, including using custom aggregation functions with apply, grouping by multiple columns for multi-level analysis, and applying filter and transform operations to refine and reshape group-wise results.

General domain of usage  
Retail sales performance analysis

グループ化の際に、組み込みの**pandas**関数（`.mean()`や`.min()`など）だけでは満足できない場合があります。

Watch this video for a clear, step-by-step walkthrough of complicated grouping techniques in pandas. You'll see how to use custom aggregation functions with `.apply()`, combine multiple columns for grouping, and interpret the results. The visual examples will reinforce how to combine group keys, perform custom calculations, and apply advanced logic to grouped data, building on the code and explanations you've just seen.

`'Length'`列を見てください。ここにはフライトの所要時間（分）が記録されています。同じ`'Flight'`列の値、次に`'Airline'`列の値ごとに、最大時間（時間単位）を計算したいとします。そのためには、各グループキーごとに`'Length'`列の最大値を計算し、それを`60`で割ります。以下の例と説明を参照してください。

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0)
data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60)
print(data_flights.head(10))

**解説：**

前の章の例を少し複雑にしましたが、データのグループ化については同じです。ここでは `.apply()` メソッドに注目します。

```python
.apply(lambda x: x['Length'].max()/60)
```
- `.apply()` - 必要な列に特定の関数を適用するためのメソッド。
- `lambda` 関数では、`x` が引数であり、`x['Length'].max()/60` が式です。この関数は各グループキーごとに**最大値**を求め、その集計値を `60` で割ります。

import unittest
import pandas as pd
import io
import sys


def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


def series_equal(s1, s2):
    """ÐÐµÑÐµÐ²ÑÑÐºÐ° ÑÑÐ²Ð½Ð¾ÑÑÑ Series Ð½ÐµÐ·Ð°Ð»ÐµÐ¶Ð½Ð¾ Ð²ÑÐ´ Ð½Ð°Ð·Ð²Ð¸, Ð°Ð»Ðµ Ð· ÑÑÐ°ÑÑÐ²Ð°Ð½Ð½ÑÐ¼ ÑÐ½Ð´ÐµÐºÑÑ ÑÐ° Ð·Ð½Ð°ÑÐµÐ½Ñ."""
    try:
        return s1.reset_index(drop=False).equals(s2.reset_index(drop=False))
    except Exception:
        return False


class TestFlightGrouping(unittest.TestCase):
    def test_grouped_min_sum(self):
        """
        Check if the grouped Series has correct values for min(Length + Time)
        by AirportFrom, Airline, and DayOfWeek.
        """
        import user_code

        # reference dataset
        url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane"
        df = pd.read_csv(url, index_col=0)

        # expected result
        ref_cols = ["AirportFrom", "Airline", "DayOfWeek", "Time", "Length"]
        ref = df[ref_cols].groupby(["AirportFrom", "Airline", "DayOfWeek"]).apply(
            lambda x: (x["Length"] + x["Time"]).min()
        )

        # user result
        assert hasattr(user_code, "data_flights"), "Variable 'data_flights' is missing."
        user_result = user_code.data_flights

        # check equality
        condition = isinstance(user_result, pd.Series) and series_equal(user_result, ref)
        _dynamic_test(
            self,
            condition,
            "The grouped Series 'data_flights' matches the expected minimum (Length + Time) per group.",
            "The result of 'data_flights' is incorrect. Ensure you group by AirportFrom, Airline, and DayOfWeek, "
            "and compute the minimum of (Length + Time)."
        )


class TestOutput(unittest.TestCase):
    def test_print_output(self):
        """
        Check that the output of print(data_flights.head(10)) is not empty.
        """
        import user_code
        captured_output = io.StringIO()
        sys.stdout = captured_output
        print(user_code.data_flights.head(10))
        sys.stdout = sys.__stdout__

        output_text = captured_output.getvalue().strip()
        condition = len(output_text) > 0
        _dynamic_test(
            self,
            condition,
            "The output displays the first 10 rows of 'data_flights' correctly.",
            "The output is missing or incorrect. Use print(data_flights.head(10))."
        )


if __name__ == "__main__":
    unittest.main()

test_code.py

このコースは、将来のデータアナリストのために多くの有用な関数を含んでいます。さまざまなデータ抽出方法を学び、条件を設定することもできます。その後、データのグループ化手法に精通することができます。また、データの前処理方法も学びます。各セクションには独自のデータセットが用意されているため、コースは魅力的なものとなっています。

このセクションでは、タイトルやインデックスによって特定の列を出力する方法を学びます。また、インデックスによって行を選択する方法についても理解を深めます。

ここでは、特定の条件を持つデータを抽出する方法を学びます。また、それらを組み合わせたり、自分自身で条件を作成したりする方法も学びます。

このセクションでは、さまざまなデータ条件の設定に関する知識を深めます。データが定義された値のリストに含まれているか、または2つの値の間にあるかを確認する方法を学びます。また、最大値と最小値を見つける方法についても学びます。

このセクションはコースの中でも特に興味深い内容の一つです。ここでは、データをさまざまな方法でグループ化する方法を学びます。特定のデータグループに関する情報を見つけるために、データアナリストとして役立つスキルを身につけることができます。

このセクションはデータアナリストにとって最も重要なものの一つです。なぜなら、データに不適切な形式の欠損値が含まれている場合、作業が不可能になるためです。したがって、ここではそのような不適切な値への対処方法を学びます。

複雑なグループ化

解答