Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Analyzing Numerical–Categorical Relationships | Bivariate and Correlation Analysis
Exploratory Data Analysis with Python

bookAnalyzing Numerical–Categorical Relationships

To analyze how a numerical feature—like sales amount—varies across categories in your retail dataset, compare the distribution of that feature for each category.

This approach helps you answer questions such as:

  • Do some product types have higher average sales than others?;
  • Is the spread of sales wider for some store locations?;
  • Which product categories are most profitable?;
  • Which customer segments tend to spend more?.

By comparing numerical data across categories, you can uncover patterns that inform business decisions and highlight key differences between groups.

123456789101112131415161718
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Example retail dataset data = { "ProductCategory": ["Electronics", "Clothing", "Electronics", "Furniture", "Clothing", "Furniture", "Electronics", "Clothing", "Furniture", "Electronics"], "SalesAmount": [200, 150, 300, 400, 160, 380, 150, 95, 420, 250] } df = pd.DataFrame(data) # Create a boxplot to compare sales amounts across product categories plt.figure(figsize=(8, 5)) sns.boxplot(x="ProductCategory", y="SalesAmount", data=df) plt.title("Sales Amount Distribution by Product Category") plt.xlabel("Product Category") plt.ylabel("Sales Amount") plt.show()
copy

When you interpret boxplots that compare sales amounts across product categories:

  • The median line inside each box shows the typical sales value for that category;
  • The spread (the height of the box and the length of the "whiskers") shows how variable sales amounts are within that category;
  • Outliers are shown as individual points outside the whiskers; these may indicate unusually high or low sales for certain products;
  • Differences in medians reveal which categories tend to sell more or less;
  • Differences in spread and outliers can highlight categories with inconsistent sales or rare but significant transactions;
question mark

Which of the following statements correctly describe how to interpret boxplots comparing sales amounts across product categories

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 2

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain how to interpret the results from the boxplot?

What other types of plots can I use to compare numerical features across categories?

How can I identify outliers in my data using this approach?

Awesome!

Completion rate improved to 5.56

bookAnalyzing Numerical–Categorical Relationships

Sveip for å vise menyen

To analyze how a numerical feature—like sales amount—varies across categories in your retail dataset, compare the distribution of that feature for each category.

This approach helps you answer questions such as:

  • Do some product types have higher average sales than others?;
  • Is the spread of sales wider for some store locations?;
  • Which product categories are most profitable?;
  • Which customer segments tend to spend more?.

By comparing numerical data across categories, you can uncover patterns that inform business decisions and highlight key differences between groups.

123456789101112131415161718
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Example retail dataset data = { "ProductCategory": ["Electronics", "Clothing", "Electronics", "Furniture", "Clothing", "Furniture", "Electronics", "Clothing", "Furniture", "Electronics"], "SalesAmount": [200, 150, 300, 400, 160, 380, 150, 95, 420, 250] } df = pd.DataFrame(data) # Create a boxplot to compare sales amounts across product categories plt.figure(figsize=(8, 5)) sns.boxplot(x="ProductCategory", y="SalesAmount", data=df) plt.title("Sales Amount Distribution by Product Category") plt.xlabel("Product Category") plt.ylabel("Sales Amount") plt.show()
copy

When you interpret boxplots that compare sales amounts across product categories:

  • The median line inside each box shows the typical sales value for that category;
  • The spread (the height of the box and the length of the "whiskers") shows how variable sales amounts are within that category;
  • Outliers are shown as individual points outside the whiskers; these may indicate unusually high or low sales for certain products;
  • Differences in medians reveal which categories tend to sell more or less;
  • Differences in spread and outliers can highlight categories with inconsistent sales or rare but significant transactions;
question mark

Which of the following statements correctly describe how to interpret boxplots comparing sales amounts across product categories

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 2
some-alt