Descriptive Statistics: Understanding Data Through Summary Measures
1234567891011121314151617181920212223242526272829303132333435363738import pandas as pd # Sample retail dataset data = { "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"], "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50], "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95] } df = pd.DataFrame(data) # Calculate summary statistics for numerical columns mean_price = df["Price"].mean() median_price = df["Price"].median() mode_price = df["Price"].mode()[0] min_price = df["Price"].min() max_price = df["Price"].max() std_price = df["Price"].std() mean_units = df["Units_Sold"].mean() median_units = df["Units_Sold"].median() mode_units = df["Units_Sold"].mode()[0] min_units = df["Units_Sold"].min() max_units = df["Units_Sold"].max() std_units = df["Units_Sold"].std() print("Price - Mean:", mean_price) print("Price - Median:", median_price) print("Price - Mode:", mode_price) print("Price - Min:", min_price) print("Price - Max:", max_price) print("Price - Std Dev:", std_price) print("Units Sold - Mean:", mean_units) print("Units Sold - Median:", median_units) print("Units Sold - Mode:", mode_units) print("Units Sold - Min:", min_units) print("Units Sold - Max:", max_units) print("Units Sold - Std Dev:", std_units)
Summary statistics give you a quick, clear overview of your retail data. These measures help you understand key characteristics of your dataset:
- Mean: the arithmetic average; shows the typical value, but can be influenced by extreme values;
- Median: the middle value when data is sorted; useful for understanding central tendency, especially if your data includes outliers or is skewed;
- Mode: the most frequently occurring value; highlights popular prices or sales numbers;
- Minimum and Maximum: show the range of your data, revealing the lowest and highest values for prices or units sold;
- Standard deviation: measures how spread out the values are; a high value means prices or sales numbers vary widely, while a low value suggests they are clustered near the mean.
In retail, these statistics help you:
- Quickly spot trends;
- Understand variability in your products and sales;
- Identify products that might need further attention.
123# Generate a summary table using pandas describe() summary = df.describe() print(summary)
The output from describe() provides a summary table for each numerical column in your data. You will see:
- Count: the number of entries;
- Mean: the average value;
- Standard deviation: how much the values vary from the mean;
- Minimum: the lowest value;
- 25th percentile (25%): the value below which 25% of data falls;
- Median (50%): the middle value;
- 75th percentile (75%): the value below which 75% of data falls;
- Maximum: the highest value.
For retail data, this summary helps you:
- Identify typical prices and units sold;
- See how much your prices or sales numbers vary;
- Understand where most of your data points are concentrated.
If the mean price is much higher than the median, your prices are likely skewed by a few expensive items. A large gap between the minimum and maximum units sold can highlight bestsellers or products with low demand. These insights support smarter decisions about pricing, inventory, and promotions.
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Can you explain the difference between mean, median, and mode in more detail?
How can I use these summary statistics to make better business decisions?
What should I do if my data has a lot of outliers?
Awesome!
Completion rate improved to 5.56
Descriptive Statistics: Understanding Data Through Summary Measures
Scorri per mostrare il menu
1234567891011121314151617181920212223242526272829303132333435363738import pandas as pd # Sample retail dataset data = { "Product": ["Shirt", "Pants", "Shoes", "Hat", "Socks", "Jacket", "Belt", "Dress", "Scarf", "Gloves"], "Price": [25.99, 40.00, 60.50, 15.00, 5.99, 80.00, 12.99, 55.00, 20.00, 18.50], "Units_Sold": [120, 80, 45, 150, 200, 30, 90, 60, 110, 95] } df = pd.DataFrame(data) # Calculate summary statistics for numerical columns mean_price = df["Price"].mean() median_price = df["Price"].median() mode_price = df["Price"].mode()[0] min_price = df["Price"].min() max_price = df["Price"].max() std_price = df["Price"].std() mean_units = df["Units_Sold"].mean() median_units = df["Units_Sold"].median() mode_units = df["Units_Sold"].mode()[0] min_units = df["Units_Sold"].min() max_units = df["Units_Sold"].max() std_units = df["Units_Sold"].std() print("Price - Mean:", mean_price) print("Price - Median:", median_price) print("Price - Mode:", mode_price) print("Price - Min:", min_price) print("Price - Max:", max_price) print("Price - Std Dev:", std_price) print("Units Sold - Mean:", mean_units) print("Units Sold - Median:", median_units) print("Units Sold - Mode:", mode_units) print("Units Sold - Min:", min_units) print("Units Sold - Max:", max_units) print("Units Sold - Std Dev:", std_units)
Summary statistics give you a quick, clear overview of your retail data. These measures help you understand key characteristics of your dataset:
- Mean: the arithmetic average; shows the typical value, but can be influenced by extreme values;
- Median: the middle value when data is sorted; useful for understanding central tendency, especially if your data includes outliers or is skewed;
- Mode: the most frequently occurring value; highlights popular prices or sales numbers;
- Minimum and Maximum: show the range of your data, revealing the lowest and highest values for prices or units sold;
- Standard deviation: measures how spread out the values are; a high value means prices or sales numbers vary widely, while a low value suggests they are clustered near the mean.
In retail, these statistics help you:
- Quickly spot trends;
- Understand variability in your products and sales;
- Identify products that might need further attention.
123# Generate a summary table using pandas describe() summary = df.describe() print(summary)
The output from describe() provides a summary table for each numerical column in your data. You will see:
- Count: the number of entries;
- Mean: the average value;
- Standard deviation: how much the values vary from the mean;
- Minimum: the lowest value;
- 25th percentile (25%): the value below which 25% of data falls;
- Median (50%): the middle value;
- 75th percentile (75%): the value below which 75% of data falls;
- Maximum: the highest value.
For retail data, this summary helps you:
- Identify typical prices and units sold;
- See how much your prices or sales numbers vary;
- Understand where most of your data points are concentrated.
If the mean price is much higher than the median, your prices are likely skewed by a few expensive items. A large gap between the minimum and maximum units sold can highlight bestsellers or products with low demand. These insights support smarter decisions about pricing, inventory, and promotions.
Grazie per i tuoi commenti!