Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Empirical Distributions from Random Samples | Random Sampling and Probabilistic Foundations
Quizzes & Challenges
Quizzes
Challenges
/
Simulation and Monte Carlo Modeling with Python

bookEmpirical Distributions from Random Samples

Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.

A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size nn, the EDF at a value xx is given by:

Fn(x)=(1/n)i=1nI(Xix)F_n(x) = (1/n) * \sum_{i=1}^n I(X_i \leq x)

Here, I(Xi<=x)I(X_i <= x) is an indicator function that equals 1 if the sample point XiX_i is less than or equal to xx, and 0 otherwise. The EDF Fn(x)F_n(x) represents the proportion of sample points less than or equal to xx. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.

123456789101112131415
import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
copy

The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.

question mark

Which statement best describes the difference between a theoretical distribution and an empirical distribution?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 5

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

bookEmpirical Distributions from Random Samples

Scorri per mostrare il menu

Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.

A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size nn, the EDF at a value xx is given by:

Fn(x)=(1/n)i=1nI(Xix)F_n(x) = (1/n) * \sum_{i=1}^n I(X_i \leq x)

Here, I(Xi<=x)I(X_i <= x) is an indicator function that equals 1 if the sample point XiX_i is less than or equal to xx, and 0 otherwise. The EDF Fn(x)F_n(x) represents the proportion of sample points less than or equal to xx. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.

123456789101112131415
import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
copy

The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.

question mark

Which statement best describes the difference between a theoretical distribution and an empirical distribution?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 5
some-alt