Visualizing Bivariate Distributions with KDE, Jointplots, and Hexbin Plots
When examining retail data, you often need to understand how two numerical features relate to each other. Advanced visualization techniques such as jointplots and hexbin plots are powerful tools for this kind of bivariate analysis.
A jointplot combines scatterplots, histograms, and kernel density estimation (KDE) to reveal both the joint distribution and the marginal distributions of two variables. This helps you spot correlations, clusters, and outliers.
A hexbin plot is especially useful for large datasets. It groups data points into hexagonal bins and colors them by frequency, making dense regions and patterns more apparent.
Suppose you are analyzing a retail dataset with features like price and discount. You want to visualize how discounts are distributed with respect to price, and whether there are any visible trends or groupings.
123456789101112131415import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Sample retail data data = { "price": [10, 12, 15, 20, 22, 23, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90], "discount": [1, 2, 2, 3, 3, 4, 4, 5, 6, 5, 7, 8, 7, 9, 10, 9, 11, 12, 11, 13] } df = pd.DataFrame(data) # Create a jointplot for price vs. discount sns.jointplot(data=df, x="price", y="discount", kind="kde", fill=True, cmap="Blues") plt.suptitle("Joint Distribution of Price and Discount (KDE)", y=1.02) plt.show()
1234567891011121314import numpy as np import matplotlib.pyplot as plt # Use the same data as above x = df["price"] y = df["discount"] plt.figure(figsize=(6, 5)) plt.hexbin(x, y, gridsize=10, cmap="Blues", edgecolors="gray") plt.colorbar(label="Count in bin") plt.xlabel("Price") plt.ylabel("Discount") plt.title("Hexbin Plot of Price vs. Discount") plt.show()
Both jointplots and hexbin plots help you explore the relationship between two numerical features, but they serve slightly different purposes.
-
A jointplot with KDE:
- Provides a smooth estimate of the joint probability density;
- Makes it easier to see general trends, clusters, and the spread of data—even with overlapping points;
- Displays the marginal distributions, giving you additional context about each variable individually.
-
A hexbin plot:
- Is especially effective when you have a large number of data points;
- Reduces overplotting by aggregating points into hexagonal bins;
- Helps you quickly spot dense areas and potential linear or nonlinear relationships.
Jointplots are more informative for smaller datasets or when you want to examine distribution shapes. Hexbin plots excel at revealing patterns in larger or more complex datasets where scatterplots would become unreadable.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Can you explain when to use a jointplot versus a hexbin plot?
What are some best practices for interpreting these plots?
Can you suggest other visualization techniques for bivariate analysis?
Awesome!
Completion rate improved to 5.56
Visualizing Bivariate Distributions with KDE, Jointplots, and Hexbin Plots
Desliza para mostrar el menú
When examining retail data, you often need to understand how two numerical features relate to each other. Advanced visualization techniques such as jointplots and hexbin plots are powerful tools for this kind of bivariate analysis.
A jointplot combines scatterplots, histograms, and kernel density estimation (KDE) to reveal both the joint distribution and the marginal distributions of two variables. This helps you spot correlations, clusters, and outliers.
A hexbin plot is especially useful for large datasets. It groups data points into hexagonal bins and colors them by frequency, making dense regions and patterns more apparent.
Suppose you are analyzing a retail dataset with features like price and discount. You want to visualize how discounts are distributed with respect to price, and whether there are any visible trends or groupings.
123456789101112131415import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Sample retail data data = { "price": [10, 12, 15, 20, 22, 23, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90], "discount": [1, 2, 2, 3, 3, 4, 4, 5, 6, 5, 7, 8, 7, 9, 10, 9, 11, 12, 11, 13] } df = pd.DataFrame(data) # Create a jointplot for price vs. discount sns.jointplot(data=df, x="price", y="discount", kind="kde", fill=True, cmap="Blues") plt.suptitle("Joint Distribution of Price and Discount (KDE)", y=1.02) plt.show()
1234567891011121314import numpy as np import matplotlib.pyplot as plt # Use the same data as above x = df["price"] y = df["discount"] plt.figure(figsize=(6, 5)) plt.hexbin(x, y, gridsize=10, cmap="Blues", edgecolors="gray") plt.colorbar(label="Count in bin") plt.xlabel("Price") plt.ylabel("Discount") plt.title("Hexbin Plot of Price vs. Discount") plt.show()
Both jointplots and hexbin plots help you explore the relationship between two numerical features, but they serve slightly different purposes.
-
A jointplot with KDE:
- Provides a smooth estimate of the joint probability density;
- Makes it easier to see general trends, clusters, and the spread of data—even with overlapping points;
- Displays the marginal distributions, giving you additional context about each variable individually.
-
A hexbin plot:
- Is especially effective when you have a large number of data points;
- Reduces overplotting by aggregating points into hexagonal bins;
- Helps you quickly spot dense areas and potential linear or nonlinear relationships.
Jointplots are more informative for smaller datasets or when you want to examine distribution shapes. Hexbin plots excel at revealing patterns in larger or more complex datasets where scatterplots would become unreadable.
¡Gracias por tus comentarios!