Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Descriptive Statistics: Understanding Data Through Summary Measures | Foundations of EDA
Exploratory Data Analysis with Python

bookDescriptive Statistics: Understanding Data Through Summary Measures

1234567891011121314151617181920212223242526272829303132333435363738
import pandas as pd # Sample retail dataset data = { "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"], "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50], "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95] } df = pd.DataFrame(data) # Calculate summary statistics for numerical columns mean_price = df["Price"].mean() median_price = df["Price"].median() mode_price = df["Price"].mode()[0] min_price = df["Price"].min() max_price = df["Price"].max() std_price = df["Price"].std() mean_units = df["Units_Sold"].mean() median_units = df["Units_Sold"].median() mode_units = df["Units_Sold"].mode()[0] min_units = df["Units_Sold"].min() max_units = df["Units_Sold"].max() std_units = df["Units_Sold"].std() print("Price - Mean:", mean_price) print("Price - Median:", median_price) print("Price - Mode:", mode_price) print("Price - Min:", min_price) print("Price - Max:", max_price) print("Price - Std Dev:", std_price) print("Units Sold - Mean:", mean_units) print("Units Sold - Median:", median_units) print("Units Sold - Mode:", mode_units) print("Units Sold - Min:", min_units) print("Units Sold - Max:", max_units) print("Units Sold - Std Dev:", std_units)
copy

Summary statistics give you a quick, clear overview of your retail data. These measures help you understand key characteristics of your dataset:

  • Mean: the arithmetic average; shows the typical value, but can be influenced by extreme values;
  • Median: the middle value when data is sorted; useful for understanding central tendency, especially if your data includes outliers or is skewed;
  • Mode: the most frequently occurring value; highlights popular prices or sales numbers;
  • Minimum and Maximum: show the range of your data, revealing the lowest and highest values for prices or units sold;
  • Standard deviation: measures how spread out the values are; a high value means prices or sales numbers vary widely, while a low value suggests they are clustered near the mean.

In retail, these statistics help you:

  • Quickly spot trends;
  • Understand variability in your products and sales;
  • Identify products that might need further attention.
123
# Generate a summary table using pandas describe() summary = df.describe() print(summary)
copy

The output from describe() provides a summary table for each numerical column in your data. You will see:

  • Count: the number of entries;
  • Mean: the average value;
  • Standard deviation: how much the values vary from the mean;
  • Minimum: the lowest value;
  • 25th percentile (25%): the value below which 25% of data falls;
  • Median (50%): the middle value;
  • 75th percentile (75%): the value below which 75% of data falls;
  • Maximum: the highest value.

For retail data, this summary helps you:

  • Identify typical prices and units sold;
  • See how much your prices or sales numbers vary;
  • Understand where most of your data points are concentrated.

If the mean price is much higher than the median, your prices are likely skewed by a few expensive items. A large gap between the minimum and maximum units sold can highlight bestsellers or products with low demand. These insights support smarter decisions about pricing, inventory, and promotions.

question mark

Which statement best explains the difference between the mean and median when analyzing skewed retail sales data?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain the difference between mean, median, and mode in more detail?

How can I use these summary statistics to make better business decisions?

What should I do if my data has a lot of outliers?

Awesome!

Completion rate improved to 5.56

bookDescriptive Statistics: Understanding Data Through Summary Measures

Scorri per mostrare il menu

1234567891011121314151617181920212223242526272829303132333435363738
import pandas as pd # Sample retail dataset data = { "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"], "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50], "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95] } df = pd.DataFrame(data) # Calculate summary statistics for numerical columns mean_price = df["Price"].mean() median_price = df["Price"].median() mode_price = df["Price"].mode()[0] min_price = df["Price"].min() max_price = df["Price"].max() std_price = df["Price"].std() mean_units = df["Units_Sold"].mean() median_units = df["Units_Sold"].median() mode_units = df["Units_Sold"].mode()[0] min_units = df["Units_Sold"].min() max_units = df["Units_Sold"].max() std_units = df["Units_Sold"].std() print("Price - Mean:", mean_price) print("Price - Median:", median_price) print("Price - Mode:", mode_price) print("Price - Min:", min_price) print("Price - Max:", max_price) print("Price - Std Dev:", std_price) print("Units Sold - Mean:", mean_units) print("Units Sold - Median:", median_units) print("Units Sold - Mode:", mode_units) print("Units Sold - Min:", min_units) print("Units Sold - Max:", max_units) print("Units Sold - Std Dev:", std_units)
copy

Summary statistics give you a quick, clear overview of your retail data. These measures help you understand key characteristics of your dataset:

  • Mean: the arithmetic average; shows the typical value, but can be influenced by extreme values;
  • Median: the middle value when data is sorted; useful for understanding central tendency, especially if your data includes outliers or is skewed;
  • Mode: the most frequently occurring value; highlights popular prices or sales numbers;
  • Minimum and Maximum: show the range of your data, revealing the lowest and highest values for prices or units sold;
  • Standard deviation: measures how spread out the values are; a high value means prices or sales numbers vary widely, while a low value suggests they are clustered near the mean.

In retail, these statistics help you:

  • Quickly spot trends;
  • Understand variability in your products and sales;
  • Identify products that might need further attention.
123
# Generate a summary table using pandas describe() summary = df.describe() print(summary)
copy

The output from describe() provides a summary table for each numerical column in your data. You will see:

  • Count: the number of entries;
  • Mean: the average value;
  • Standard deviation: how much the values vary from the mean;
  • Minimum: the lowest value;
  • 25th percentile (25%): the value below which 25% of data falls;
  • Median (50%): the middle value;
  • 75th percentile (75%): the value below which 75% of data falls;
  • Maximum: the highest value.

For retail data, this summary helps you:

  • Identify typical prices and units sold;
  • See how much your prices or sales numbers vary;
  • Understand where most of your data points are concentrated.

If the mean price is much higher than the median, your prices are likely skewed by a few expensive items. A large gap between the minimum and maximum units sold can highlight bestsellers or products with low demand. These insights support smarter decisions about pricing, inventory, and promotions.

question mark

Which statement best explains the difference between the mean and median when analyzing skewed retail sales data?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2
some-alt