Empirical Distributions from Random Samples
Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.
A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size n, the EDF at a value x is given by:
Fn(x)=(1/n)∗i=1∑nI(Xi≤x)Here, I(Xi<=x) is an indicator function that equals 1 if the sample point Xi is less than or equal to x, and 0 otherwise. The EDF Fn(x) represents the proportion of sample points less than or equal to x. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.
123456789101112131415import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Awesome!
Completion rate improved to 7.14
Empirical Distributions from Random Samples
Deslize para mostrar o menu
Understanding how to build and interpret empirical distributions is a core skill in simulation-based inference. An empirical distribution is constructed by collecting a large number of outcomes from a random process—such as rolling a die or simulating a game—and then examining how frequently each outcome occurs. Unlike a theoretical distribution, which is derived from mathematical formulas and assumptions, an empirical distribution is based on actual data generated from random sampling. This approach allows you to visualize and analyze the behavior of complex systems, even when the underlying probabilities are unknown or difficult to compute.
A key mathematical tool for describing empirical distributions is the empirical distribution function (EDF). For a sample of size n, the EDF at a value x is given by:
Fn(x)=(1/n)∗i=1∑nI(Xi≤x)Here, I(Xi<=x) is an indicator function that equals 1 if the sample point Xi is less than or equal to x, and 0 otherwise. The EDF Fn(x) represents the proportion of sample points less than or equal to x. This function provides a stepwise approximation to the true cumulative distribution and becomes more accurate as the sample size increases.
123456789101112131415import numpy as np import matplotlib.pyplot as plt # Simulate rolling a fair six-sided die 10,000 times num_rolls = 10000 samples = np.random.randint(1, 7, size=num_rolls) # Plot the empirical distribution as a histogram plt.figure(figsize=(8, 5)) plt.hist(samples, bins=np.arange(1, 8) - 0.5, edgecolor='black', rwidth=0.8) plt.title("Empirical Distribution of 10,000 Dice Rolls") plt.xlabel("Dice Face") plt.ylabel("Frequency") plt.xticks(range(1, 7)) plt.show()
The histogram you see represents the empirical distribution of outcomes from 10,000 simulated dice rolls. Each bar shows how often a particular face appeared. As the sample size increases, the heights of the bars approach the values predicted by the theoretical distribution—in this case, each face should appear about one-sixth of the time. With smaller samples, the frequencies may fluctuate more due to random chance, but as you collect more data, the empirical distribution becomes a more accurate reflection of the true underlying probabilities. This illustrates how simulation can be used to approximate complex distributions and supports the principles you observed in the Law of Large Numbers.
Obrigado pelo seu feedback!