Summary  
This chapter covers how to explore numerical data distributions by creating histograms and overlaying kernel density estimate curves using Seaborn’s `histplot` function in Python.

General domain of usage  
Data science exploratory analysis

データセットから有意義な結論を導き出す前に、その構造と主要な特徴を理解する必要があります。このプロセスは**データ探索**と呼ばれます。データ探索では、データをさまざまな観点から確認し、主な特徴を要約し、重要なパターンを可視化します。データ探索により、傾向や外れ値、潜在的な問題点を、より詳細な統計解析を行う前に把握できます。

数値データを探索する際に最も有用なツールの一つが**ヒストグラム**です。ヒストグラムは、データセット内で異なる値の範囲がどれくらいの頻度で現れるかを示す棒グラフの一種です。各バーは値の範囲（「ビン」と呼ばれる）を表し、バーの高さはその範囲に含まれるデータポイントの数を示します。ヒストグラムを使うことで、データの分布、中心、広がりを一目で把握できます。

Pythonでは、**seaborn**ライブラリの`histplot`関数を使って簡単にヒストグラムを作成できます。`histplot`関数は、データを受け取り、その分布をヒストグラムとして表示します。また、プロットに**カーネル密度推定（KDE）**曲線を追加することもでき、データ分布の滑らかな近似を示します。これにより、データの基礎的なパターンをより深く理解できます。

今後のタスクでは、`histplot`関数を使ってペンギンの体重分布を可視化します。これにより、データセットを探索し、さらなる統計解析の準備を行います。

import unittest
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


class TestDataReading(unittest.TestCase):
    def test_data_loaded(self):
        import user_code

        expected_data = pd.read_csv(
            'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/section_1/confidence.csv',
            index_col=0
        )

        condition = (
            isinstance(user_code.data, pd.DataFrame)
            and user_code.data.shape == expected_data.shape
        )

        _dynamic_test(
            self,
            condition,
            "The CSV file is correctly read into the 'data' variable.",
            "The CSV file is not read correctly into the 'data' variable."
        )


class TestPlot(unittest.TestCase):
    def test_histplot(self):
        import user_code

        # ÐÐµÑÐµÐ²ÑÑÑÑÐ¼Ð¾, ÑÐ¸ ÑÑÐ²Ð¾ÑÐµÐ½Ð¾ Ð³ÑÐ°ÑÑÐº Ð· seaborn.histplot
        plot_obj = user_code.plot

        condition = (
            hasattr(plot_obj, "get_xlabel")
            and plot_obj.get_xlabel() == "The Mass"
            and plot_obj.get_ylabel() == "The Quantity"
        )

        _dynamic_test(
            self,
            condition,
            "The histplot is created with correct parameters.",
            "The histplot parameters are incorrect."
        )


if __name__ == "__main__":
    unittest.main()


test_code.py

Pythonを使用して統計学の基礎をしっかりと築きます。必須の統計概念を学び、NumPyやpandasを通じて実践的に応用します。平均や分散などの基本的な指標から、仮説検定、信頼区間、データ駆動型の洞察まで、ハンズオンで習得します。

データ型、代表値、サンプルと母集団の主な違いなど、統計学の基本原則を学びます。

Pythonを使用して平均値、中央値、最頻値を計算し解釈する方法を学習します。pandasを用いて実際のデータセットでこれらの操作を練習します。

分散と標準偏差がデータのばらつきをどのように測定するかを理解します。手動およびPythonツールを使用して両方を計算する方法を学びます。

共分散と相関が変数間の関係をどのように表すかを探求します。Pythonで両方の指標を計算し比較する練習を行います。

信頼区間を習得し、母集団パラメータを推定します。NumPy、pandas、および可視化ライブラリを使用して、実データで区間を計算し解釈します。

仮説検定とt検定の基礎を学習します。データに基づいた意思決定を支援するために、Pythonを用いて検定を設計・実施・解釈する方法を理解します。

データセットの探索

解答