Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Empirical Distributions from Random Samples | Random Sampling and Probabilistic Foundations
Simulation and Monte Carlo Modeling with Python

bookEmpirical Distributions from Random Samples

Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.

A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size nn, the EDF at a value xx is given by:

Fn(x)=(1/n)i=1nI(Xix)F_n(x) = (1/n) * \sum_{i=1}^n I(X_i \leq x)

Here, I(Xi<=x)I(X_i <= x) is an indicator function that equals 1 if the sample point XiX_i is less than or equal to xx, and 0 otherwise. The EDF Fn(x)F_n(x) represents the proportion of sample points less than or equal to xx. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.

123456789101112131415
import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
copy

The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.

question mark

Which statement best describes the difference between a theoretical distribution and an empirical distribution?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 5

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

bookEmpirical Distributions from Random Samples

Swipe um das Menü anzuzeigen

Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.

A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size nn, the EDF at a value xx is given by:

Fn(x)=(1/n)i=1nI(Xix)F_n(x) = (1/n) * \sum_{i=1}^n I(X_i \leq x)

Here, I(Xi<=x)I(X_i <= x) is an indicator function that equals 1 if the sample point XiX_i is less than or equal to xx, and 0 otherwise. The EDF Fn(x)F_n(x) represents the proportion of sample points less than or equal to xx. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.

123456789101112131415
import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
copy

The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.

question mark

Which statement best describes the difference between a theoretical distribution and an empirical distribution?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 5
some-alt