Visualizing Numerical Features: Histograms, KDE, Boxplots
Understanding how numerical features are distributed is a key part of exploratory data analysis in retail datasets. Three essential visualization tools for this purpose are histograms, kernel density estimation (KDE) plots, and boxplots. Each method provides a different perspective on the shape, spread, and characteristics of your data. In retail analysis, these plots help you uncover trends in product prices, sales amounts, and transaction values, making it easier to spot patterns, skewness, or potential outliers that could affect business decisions.
12345678910111213141516import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Sample retail data data = pd.DataFrame({ "product_price": [9.99, 12.49, 7.99, 19.99, 14.99, 11.49, 16.99, 9.99, 21.99, 8.49, 15.49, 13.99] }) plt.figure(figsize=(6, 4)) sns.histplot(data["product_price"], bins=6, kde=False, color="skyblue", edgecolor="black") plt.title("Histogram of Product Prices") plt.xlabel("Product Price ($)") plt.ylabel("Frequency") plt.tight_layout() plt.show()
1234567891011121314import numpy as np # Simulated sales amounts for retail transactions sales_data = pd.DataFrame({ "sales_amount": np.random.gamma(shape=2.0, scale=20.0, size=100) }) plt.figure(figsize=(6, 4)) sns.kdeplot(sales_data["sales_amount"], fill=True, color="orange") plt.title("KDE Plot of Sales Amounts") plt.xlabel("Sales Amount ($)") plt.ylabel("Density") plt.tight_layout() plt.show()
1234567891011# Simulated transaction amounts with a few outliers transaction_data = pd.DataFrame({ "transaction_amount": [50, 60, 55, 52, 58, 54, 53, 56, 87, 51, 57, 59, 80] }) plt.figure(figsize=(4, 6)) sns.boxplot(y=transaction_data["transaction_amount"], color="lightgreen") plt.title("Boxplot of Transaction Amounts") plt.ylabel("Transaction Amount ($)") plt.tight_layout() plt.show()
Each visualization helps you explore numerical features, but they serve different purposes in retail data analysis:
- Histograms: show how frequently each value or range of values appears. You can quickly see where most product prices cluster and detect gaps or spikes;
- KDE plots: provide a smoothed version of the distribution, making it easier to spot subtle peaks, tails, or overall shape in sales amounts;
- Boxplots: summarize data using the median, quartiles, and highlight potential outliers. This makes it easy to spot unusually high or low transaction amounts that may need further investigation.
Use these plots together to get a complete picture of your numerical data's distribution and identify key patterns or outliers.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 5.56
Visualizing Numerical Features: Histograms, KDE, Boxplots
Свайпніть щоб показати меню
Understanding how numerical features are distributed is a key part of exploratory data analysis in retail datasets. Three essential visualization tools for this purpose are histograms, kernel density estimation (KDE) plots, and boxplots. Each method provides a different perspective on the shape, spread, and characteristics of your data. In retail analysis, these plots help you uncover trends in product prices, sales amounts, and transaction values, making it easier to spot patterns, skewness, or potential outliers that could affect business decisions.
12345678910111213141516import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Sample retail data data = pd.DataFrame({ "product_price": [9.99, 12.49, 7.99, 19.99, 14.99, 11.49, 16.99, 9.99, 21.99, 8.49, 15.49, 13.99] }) plt.figure(figsize=(6, 4)) sns.histplot(data["product_price"], bins=6, kde=False, color="skyblue", edgecolor="black") plt.title("Histogram of Product Prices") plt.xlabel("Product Price ($)") plt.ylabel("Frequency") plt.tight_layout() plt.show()
1234567891011121314import numpy as np # Simulated sales amounts for retail transactions sales_data = pd.DataFrame({ "sales_amount": np.random.gamma(shape=2.0, scale=20.0, size=100) }) plt.figure(figsize=(6, 4)) sns.kdeplot(sales_data["sales_amount"], fill=True, color="orange") plt.title("KDE Plot of Sales Amounts") plt.xlabel("Sales Amount ($)") plt.ylabel("Density") plt.tight_layout() plt.show()
1234567891011# Simulated transaction amounts with a few outliers transaction_data = pd.DataFrame({ "transaction_amount": [50, 60, 55, 52, 58, 54, 53, 56, 87, 51, 57, 59, 80] }) plt.figure(figsize=(4, 6)) sns.boxplot(y=transaction_data["transaction_amount"], color="lightgreen") plt.title("Boxplot of Transaction Amounts") plt.ylabel("Transaction Amount ($)") plt.tight_layout() plt.show()
Each visualization helps you explore numerical features, but they serve different purposes in retail data analysis:
- Histograms: show how frequently each value or range of values appears. You can quickly see where most product prices cluster and detect gaps or spikes;
- KDE plots: provide a smoothed version of the distribution, making it easier to spot subtle peaks, tails, or overall shape in sales amounts;
- Boxplots: summarize data using the median, quartiles, and highlight potential outliers. This makes it easy to spot unusually high or low transaction amounts that may need further investigation.
Use these plots together to get a complete picture of your numerical data's distribution and identify key patterns or outliers.
Дякуємо за ваш відгук!