Bivariate Analysis
Desliza para mostrar el menú
Bivariate analysis is an essential step in exploratory data analysis (EDA) that focuses on examining the relationship between two variables. This process helps you uncover patterns, trends, or associations that may not be visible when looking at variables individually. By analyzing two variables together, you can identify whether changes in one variable are associated with changes in another, which is crucial for hypothesis generation, feature selection, and deeper understanding of your dataset.
1234567891011121314151617import pandas as pd # Sample DataFrame data = { "age": [22, 25, 47, 52, 46, 56, 55, 60, 62, 61], "salary": [25000, 32000, 47000, 52000, 48000, 60000, 58000, 62000, 63000, 64000], "department": ["HR", "Finance", "HR", "Engineering", "Engineering", "Finance", "HR", "Engineering", "Finance", "HR"] } df = pd.DataFrame(data) # Select two relevant columns for analysis age = df["age"] salary = df["salary"] # Compute the correlation coefficient between age and salary correlation = df["age"].corr(df["salary"]) print("Correlation between age and salary:", correlation)
1234567import matplotlib.pyplot as plt import seaborn as sns # Scatter plot using seaborn sns.scatterplot(x="age", y="salary", data=df) plt.title("Seaborn Scatter Plot of Age vs Salary") plt.show()
When interpreting the correlation coefficient, values close to 1 indicate a strong positive relationship, meaning as one variable increases, the other tends to increase as well. Values close to -1 indicate a strong negative relationship, where one variable increases as the other decreases. Values near 0 suggest little or no linear relationship. The scatter plot visually supports this interpretation: a clear upward or downward trend in the points reflects strong correlation, while a cloud of points with no discernible pattern indicates a weak or no relationship.
1234567# Boxplot to compare salary distribution across departments plt.figure(figsize=(6, 4)) sns.boxplot(x="department", y="salary", data=df) plt.title("Salary Distribution by Department") plt.xlabel("Department") plt.ylabel("Salary") plt.show()
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla