Impara Descriptive Statistics: Understanding Data Through Summary Measures


              1234567891011121314151617181920212223242526272829303132333435363738
            
import pandas as pd

# Sample retail dataset
data = {
    "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"],
    "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50],
    "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95]
}

df = pd.DataFrame(data)

# Calculate summary statistics for numerical columns
mean_price = df["Price"].mean()
median_price = df["Price"].median()
mode_price = df["Price"].mode()[0]
min_price = df["Price"].min()
max_price = df["Price"].max()
std_price = df["Price"].std()

mean_units = df["Units_Sold"].mean()
median_units = df["Units_Sold"].median()
mode_units = df["Units_Sold"].mode()[0]
min_units = df["Units_Sold"].min()
max_units = df["Units_Sold"].max()
std_units = df["Units_Sold"].std()

print("Price - Mean:", mean_price)
print("Price - Median:", median_price)
print("Price - Mode:", mode_price)
print("Price - Min:", min_price)
print("Price - Max:", max_price)
print("Price - Std Dev:", std_price)
print("Units Sold - Mean:", mean_units)
print("Units Sold - Median:", median_units)
print("Units Sold - Mode:", mode_units)
print("Units Sold - Min:", min_units)
print("Units Sold - Max:", max_units)
print("Units Sold - Std Dev:", std_units)

Summary statistics give you a quick, clear overview of your retail data. These measures help you understand key characteristics of your dataset:

Mean: the arithmetic average; shows the typical value, but can be influenced by extreme values;
Median: the middle value when data is sorted; useful for understanding central tendency, especially if your data includes outliers or is skewed;
Mode: the most frequently occurring value; highlights popular prices or sales numbers;
Minimum and Maximum: show the range of your data, revealing the lowest and highest values for prices or units sold;
Standard deviation: measures how spread out the values are; a high value means prices or sales numbers vary widely, while a low value suggests they are clustered near the mean.

In retail, these statistics help you:

Quickly spot trends;
Understand variability in your products and sales;
Identify products that might need further attention.


              123
            
# Generate a summary table using pandas describe()
summary = df.describe()
print(summary)

The output from describe() provides a summary table for each numerical column in your data. You will see:

Count: the number of entries;
Mean: the average value;
Standard deviation: how much the values vary from the mean;
Minimum: the lowest value;
25th percentile (25%): the value below which 25% of data falls;
Median (50%): the middle value;
75th percentile (75%): the value below which 75% of data falls;
Maximum: the highest value.

For retail data, this summary helps you:

Identify typical prices and units sold;
See how much your prices or sales numbers vary;
Understand where most of your data points are concentrated.

If the mean price is much higher than the median, your prices are likely skewed by a few expensive items. A large gap between the minimum and maximum units sold can highlight bestsellers or products with low demand. These insights support smarter decisions about pricing, inventory, and promotions.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain the difference between mean, median, and mode in more detail?

How can I use these summary statistics to make better business decisions?

What should I do if my data has a lot of outliers?

Awesome!

Completion rate improved to 5.56

Scorri per mostrare il menu


              1234567891011121314151617181920212223242526272829303132333435363738
            
import pandas as pd

# Sample retail dataset
data = {
    "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"],
    "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50],
    "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95]
}

df = pd.DataFrame(data)

# Calculate summary statistics for numerical columns
mean_price = df["Price"].mean()
median_price = df["Price"].median()
mode_price = df["Price"].mode()[0]
min_price = df["Price"].min()
max_price = df["Price"].max()
std_price = df["Price"].std()

mean_units = df["Units_Sold"].mean()
median_units = df["Units_Sold"].median()
mode_units = df["Units_Sold"].mode()[0]
min_units = df["Units_Sold"].min()
max_units = df["Units_Sold"].max()
std_units = df["Units_Sold"].std()

print("Price - Mean:", mean_price)
print("Price - Median:", median_price)
print("Price - Mode:", mode_price)
print("Price - Min:", min_price)
print("Price - Max:", max_price)
print("Price - Std Dev:", std_price)
print("Units Sold - Mean:", mean_units)
print("Units Sold - Median:", median_units)
print("Units Sold - Mode:", mode_units)
print("Units Sold - Min:", min_units)
print("Units Sold - Max:", max_units)
print("Units Sold - Std Dev:", std_units)