Summary  
This chapter covers generating random samples to build empirical distributions, computing the empirical distribution function, and visualizing outcome frequencies with histograms.  

General domain of usage  
Statistical simulation

Understanding how to build and interpret **empirical distributions** is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.

A key mathematical tool for describing empirical distributions is the **empirical distribution function (EDF)**. For a sample of size $$n$$, the EDF at a value $$x$$ is given by:

$$
F_n(x) = (1/n) * \sum_{i=1}^n I(X_i \leq x)
$$

Here, $$I(X_i <= x)$$ is an indicator function that equals 1 if the sample point $$X_i$$ is less than or equal to $$x$$, and 0 otherwise. The EDF $$F_n(x)$$ represents the proportion of sample points less than or equal to $$x$$. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.

import numpy as np
import matplotlib.pyplot as plt

# Simulate rolling a fair six-sided die 10,000 times
num_rolls = 10000
samples = np.random.randint(1, 7, size=num_rolls)

# Plot the empirical distribution as a histogram
plt.figure(figsize=(8, 5))
plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8)
plt.title("Empirical Distribution of 10,000 Dice Rolls")
plt.xlabel("Dice Face")
plt.ylabel("Frequency")
plt.xticks(range(1, 7))
plt.show()

The histogram you see represents the **empirical distribution** of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.

Which statement best describes the difference between a theoretical distribution and an empirical distribution?

Master the essentials of simulation and Monte Carlo modeling in Python. Learn to generate random samples, build uncertainty models, estimate risk, and simulate simple financial scenarios using practical code and hands-on exercises.

Explore the basics of random number generation, probability distributions, and reproducibility in Python. Build a strong foundation for simulation by understanding how randomness is modeled and controlled.

Apply Monte Carlo methods to model uncertainty, estimate risk, and analyze probabilistic outcomes. Learn to run multi-simulation loops and interpret results.

Apply simulation techniques to simple financial models, including portfolio returns, random walks, and toy pricing scenarios. Aggregate outcomes and estimate practical risk.

Empirical Distributions from Random Samples