メニューを表示するにはスワイプしてください

二変量解析は、探索的データ解析（EDA）において不可欠なステップであり、2つの変数間の関係性を調査することに焦点を当てています。このプロセスにより、個々の変数を単独で見るだけでは見つけられないパターンや傾向、関連性を明らかにすることができます。2つの変数を同時に分析することで、一方の変数の変化がもう一方の変数の変化と関連しているかどうかを特定でき、仮説生成、特徴量選択、データセットのより深い理解に役立ちます。


              1234567891011121314151617
            
import pandas as pd

# Sample DataFrame
data = {
    "age": [22, 25, 47, 52, 46, 56, 55, 60, 62, 61],
    "salary": [25000, 32000, 47000, 52000, 48000, 60000, 58000, 62000, 63000, 64000],
    "department": ["HR", "Finance", "HR", "Engineering", "Engineering", "Finance", "HR", "Engineering", "Finance", "HR"]
}
df = pd.DataFrame(data)

# Select two relevant columns for analysis
age = df["age"]
salary = df["salary"]

# Compute the correlation coefficient between age and salary
correlation = df["age"].corr(df["salary"])
print("Correlation between age and salary:", correlation)


              1234567
            
import matplotlib.pyplot as plt
import seaborn as sns

# Scatter plot using seaborn
sns.scatterplot(x="age", y="salary", data=df)
plt.title("Seaborn Scatter Plot of Age vs Salary")
plt.show()

相関係数を解釈する際、1に近い値は強い正の関係を示し、一方の変数が増加するともう一方も増加する傾向があることを意味します。-1に近い値は強い負の関係を示し、一方の変数が増加するともう一方が減少する傾向があります。0付近の値は線形関係がほとんどない、または全くないことを示します。散布図はこの解釈を視覚的に裏付けます。点が明確に上昇または下降する傾向を示していれば強い相関、パターンのない点の集まりであれば弱い、または相関がないことを示します。


              1234567
            
# Boxplot to compare salary distribution across departments
plt.figure(figsize=(6, 4))
sns.boxplot(x="department", y="salary", data=df)
plt.title("Salary Distribution by Department")
plt.xlabel("Department")
plt.ylabel("Salary")
plt.show()

すべて明確でしたか？

フィードバックありがとうございます！

セクション 1. 章 22

AIに質問する

何でも質問するか、提案された質問の1つを試してチャットを始めてください

二変量解析


              1234567891011121314151617
            
import pandas as pd

# Sample DataFrame
data = {
    "age": [22, 25, 47, 52, 46, 56, 55, 60, 62, 61],
    "salary": [25000, 32000, 47000, 52000, 48000, 60000, 58000, 62000, 63000, 64000],
    "department": ["HR", "Finance", "HR", "Engineering", "Engineering", "Finance", "HR", "Engineering", "Finance", "HR"]
}
df = pd.DataFrame(data)

# Select two relevant columns for analysis
age = df["age"]
salary = df["salary"]

# Compute the correlation coefficient between age and salary
correlation = df["age"].corr(df["salary"])
print("Correlation between age and salary:", correlation)


              1234567
            
import matplotlib.pyplot as plt
import seaborn as sns

# Scatter plot using seaborn
sns.scatterplot(x="age", y="salary", data=df)
plt.title("Seaborn Scatter Plot of Age vs Salary")
plt.show()


              1234567
            
# Boxplot to compare salary distribution across departments
plt.figure(figsize=(6, 4))
sns.boxplot(x="department", y="salary", data=df)
plt.title("Salary Distribution by Department")
plt.xlabel("Department")
plt.ylabel("Salary")
plt.show()

すべて明確でしたか？

フィードバックありがとうございます！

セクション 1. 章 22