Вивчайте Visualizing Numerical Features: Histograms, KDE, Boxplots

Understanding how numerical features are distributed is a key part of exploratory data analysis in retail datasets. Three essential visualization tools for this purpose are histograms, kernel density estimation (KDE) plots, and boxplots. Each method provides a different perspective on the shape, spread, and characteristics of your data. In retail analysis, these plots help you uncover trends in product prices, sales amounts, and transaction values, making it easier to spot patterns, skewness, or potential outliers that could affect business decisions.


              12345678910111213141516
            
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample retail data
data = pd.DataFrame({
    "product_price": [9.99, 12.49, 7.99, 19.99, 14.99, 11.49, 16.99, 9.99, 21.99, 8.49, 15.49, 13.99]
})

plt.figure(figsize=(6, 4))
sns.histplot(data["product_price"], bins=6, kde=False, color="skyblue", edgecolor="black")
plt.title("Histogram of Product Prices")
plt.xlabel("Product Price ($)")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()


              1234567891011121314
            
import numpy as np

# Simulated sales amounts for retail transactions
sales_data = pd.DataFrame({
    "sales_amount": np.random.gamma(shape=2.0, scale=20.0, size=100)
})

plt.figure(figsize=(6, 4))
sns.kdeplot(sales_data["sales_amount"], fill=True, color="orange")
plt.title("KDE Plot of Sales Amounts")
plt.xlabel("Sales Amount ($)")
plt.ylabel("Density")
plt.tight_layout()
plt.show()


              1234567891011
            
# Simulated transaction amounts with a few outliers
transaction_data = pd.DataFrame({
    "transaction_amount": [50, 60, 55, 52, 58, 54, 53, 56, 87, 51, 57, 59, 80]
})

plt.figure(figsize=(4, 6))
sns.boxplot(y=transaction_data["transaction_amount"], color="lightgreen")
plt.title("Boxplot of Transaction Amounts")
plt.ylabel("Transaction Amount ($)")
plt.tight_layout()
plt.show()

Each visualization helps you explore numerical features, but they serve different purposes in retail data analysis:

Histograms: show how frequently each value or range of values appears. You can quickly see where most product prices cluster and detect gaps or spikes;
KDE plots: provide a smoothed version of the distribution, making it easier to spot subtle peaks, tails, or overall shape in sales amounts;
Boxplots: summarize data using the median, quartiles, and highlight potential outliers. This makes it easy to spot unusually high or low transaction amounts that may need further investigation.

Use these plots together to get a complete picture of your numerical data's distribution and identify key patterns or outliers.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 1

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 5.56

Свайпніть щоб показати меню


              12345678910111213141516
            
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample retail data
data = pd.DataFrame({
    "product_price": [9.99, 12.49, 7.99, 19.99, 14.99, 11.49, 16.99, 9.99, 21.99, 8.49, 15.49, 13.99]
})

plt.figure(figsize=(6, 4))
sns.histplot(data["product_price"], bins=6, kde=False, color="skyblue", edgecolor="black")
plt.title("Histogram of Product Prices")
plt.xlabel("Product Price ($)")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()


              1234567891011121314
            
import numpy as np

# Simulated sales amounts for retail transactions
sales_data = pd.DataFrame({
    "sales_amount": np.random.gamma(shape=2.0, scale=20.0, size=100)
})

plt.figure(figsize=(6, 4))
sns.kdeplot(sales_data["sales_amount"], fill=True, color="orange")
plt.title("KDE Plot of Sales Amounts")
plt.xlabel("Sales Amount ($)")
plt.ylabel("Density")
plt.tight_layout()
plt.show()


              1234567891011
            
# Simulated transaction amounts with a few outliers
transaction_data = pd.DataFrame({
    "transaction_amount": [50, 60, 55, 52, 58, 54, 53, 56, 87, 51, 57, 59, 80]
})

plt.figure(figsize=(4, 6))
sns.boxplot(y=transaction_data["transaction_amount"], color="lightgreen")
plt.title("Boxplot of Transaction Amounts")
plt.ylabel("Transaction Amount ($)")
plt.tight_layout()
plt.show()

Each visualization helps you explore numerical features, but they serve different purposes in retail data analysis:

Histograms: show how frequently each value or range of values appears. You can quickly see where most product prices cluster and detect gaps or spikes;
KDE plots: provide a smoothed version of the distribution, making it easier to spot subtle peaks, tails, or overall shape in sales amounts;
Boxplots: summarize data using the median, quartiles, and highlight potential outliers. This makes it easy to spot unusually high or low transaction amounts that may need further investigation.

Use these plots together to get a complete picture of your numerical data's distribution and identify key patterns or outliers.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 1