メニューを表示するにはスワイプしてください

定義

共分散は、2つの確率変数の同時変動を測定する指標。

標本と母集団の共分散の公式は異なるが、ここでは詳細には扱わない。この章では、次のデータセットの共分散の計算に焦点を当てる。

Store_ID: the unique id of the store;
Store_Area: the area of the store;
Items_Available: the number of items that are available in the store;
Daily_Customer_Count: the daily number of customers in the store;
Store_Sales: the number of sales in the store.

Pythonによる共分散の計算

Pythonで共分散を計算するには、NumPyライブラリのnp.cov()関数を使用。2つのパラメータとして、共分散を計算したいデータ系列を指定。

結果はインデックス[0,1]の値。その他の出力値については本コースでは扱わない。例を参照:


              123456789
            
import pandas as pd 
import numpy as np

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/Stores.csv')

# Calculating covariance 
cov = np.cov(df['Store_Area'], df['Items_Available'])[0,1]

print(round(cov, 2))

これは値が同じ方向に動くことを示す。店舗面積が大きいほど、取り扱う商品の数も多くなるため、これは妥当。共分散の大きな欠点の一つは、その値が無限大になる可能性がある点。

すべて明確でしたか？

フィードバックありがとうございます！

セクション 1. 章 19

AIに質問する

何でも質問するか、提案された質問の1つを試してチャットを始めてください

共分散

定義