Summary
This chapter demonstrates how to perform univariate analysis by extracting a single variable, computing descriptive statistics (mean, median, mode, standard deviation, min, max), and visualizing its distribution using a histogram and boxplot.

General domain of usage
Exploratory data analysis

Univariate analysis is a foundational step in **exploratory data analysis (EDA)**, focusing on examining each variable in your dataset independently. By analyzing variables one at a time, you can uncover essential characteristics such as **central tendency**, **spread**, **shape**, and the presence of **outliers**. This process helps you understand the basic properties of your data, identify potential data quality issues, and select appropriate techniques for further analysis. Univariate analysis is crucial for building intuition about your dataset before moving on to more complex, multivariate relationships.

import pandas as pd

# Load a sample dataset
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)

# Select a single column for analysis: "total_bill"
total_bill = df["total_bill"]
print(total_bill.head())

# Calculate descriptive statistics for the "total_bill" variable
mean = total_bill.mean()
median = total_bill.median()
mode = total_bill.mode()[0]
std = total_bill.std()
min_value = total_bill.min()
max_value = total_bill.max()

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode:.2f}")
print(f"Standard Deviation: {std:.2f}")
print(f"Min: {min_value:.2f}")
print(f"Max: {max_value:.2f}")

import matplotlib.pyplot as plt
import seaborn as sns

# Create a histogram
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
sns.histplot(total_bill, bins=20, kde=True)
plt.title("Histogram of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Frequency")

# Create a boxplot
plt.subplot(1, 2, 2)
sns.boxplot(x=total_bill)
plt.title("Boxplot of Total Bill")
plt.xlabel("Total Bill")

plt.tight_layout()
plt.show()

### Interpreting Descriptive Statistics

- **Mean:** shows the average value of `total_bill`;
- **Median:** gives the middle value when all bills are sorted;
- **Mode:** identifies the most frequently occurring bill amount;
- **Standard deviation:** measures how spread out values are around the mean. A higher value means more variability;
- **Minimum and maximum:** indicate the range of the data.

If the **mean** and **median** are close, the distribution is likely symmetric; if they differ, the data may be skewed.

### Understanding Visualizations

- **Histogram:** displays the distribution of `total_bill`. Peaks indicate common values, and the overall shape (symmetric, skewed left, or skewed right) reveals how most bills are distributed;
- **Boxplot:** summarizes the spread, median, and potential outliers. The box shows the interquartile range (middle 50% of data), the line inside the box is the median, and points outside the "whiskers" are considered outliers.

By combining these statistics and visualizations, you can quickly spot unusual values, skewness, and the general pattern of your variable. This understanding guides your next steps in data cleaning and analysis.

Which statement best describes univariate analysis in the context of exploratory data analysis?

Learn how to explore and communicate data through visual analysis. The course covers creating common plots, customizing visualizations, and using statistical charts to examine data distributions, relationships, and patterns. Exploratory data analysis techniques are applied to identify trends, anomalies, and insights in real-world datasets.

Univariate Analysis

Interpreting Descriptive Statistics

Understanding Visualizations