Course Content
Probability Theory Mastering
Probability Theory Mastering
Law of Large Numbers for Bernoulli Process
A Bernoulli trial is a statistical experiment with only two possible outcomes, usually success and failure, with fixed probabilities of occurrence on each trial. It was considered in more detail in the Probability Theory Basics course.
In a Bernoulli process, each trial is independent, meaning the outcome of one trial does not affect the outcome of any other trial. The probability of success, denoted by p
, is the same for every trial. The probability of failure is indicated by q = 1 - p
.
Let's try to apply the law of large numbers to this scheme. Assume that we provide n
experiments and want to calculate the total number of successful results. According to the law of large numbers law, we can do it as follows:
Each variable in the numerator represents the outcome of one experiment: it's 1
if the experiment succeeds (with probability p
) and 0
if it fails (with probability 1-p
).
In this case, the conditions of the law of large numbers are met: the variables are independent (as the experiments are independent), identically distributed, and have a finite expectation (as shown by the distribution series).
Therefore, we can use the law of large numbers to estimate the probabilities of an event's occurrence by analyzing the frequency of its occurrence.
For example, let's consider flipping a coin with a displaced center of gravity. Our goal is to estimate the probability of it landing heads up. Check out the code below:
import numpy as np import matplotlib.pyplot as plt # Set the probability of heads to 0.3 p = 0.3 # Generate 2000 flips of the coin with probability of heads equal to `p` coin_flips = np.random.choice([1, 0], size=2000, p=[p, 1-p]) # Function that will calculate mean value of subsamples def mean_value(data, subsample_size): return data[:subsample_size].mean() # Visualizing the results x = np.arange(2000) y = np.zeros(2000) for i in range(1, 2000): y[i] = mean_value(coin_flips, x[i]) plt.plot(x, y, label='Estimated probability') plt.xlabel('Number of elements to calculate probability') plt.ylabel('Probability of success') plt.axhline(y=p, color='k', label='Real probability of success') plt.legend() plt.show()
Similarly, the law of large numbers can be generalized for a polynomial scheme: for 1, we consider the occurrence of the event/events of interest to us, and for 0, all other results. Let's look at an example:
import numpy as np import matplotlib.pyplot as plt # Our distribution with 4 possible values outcomes = ['Red', 'Blue', 'Black', 'Green'] # Probabilities of corresponding values probs = [0.3, 0.2, 0.4, 0.1] # Generate samples samples = np.random.choice(outcomes, size=2000, p=probs) # Suppose we want to determine the probability of occurrence of red or black colors. # Let's transform the data in such a way that 1 stands in place of 'Red' and 'Black' colors, # and 0 in place of other colors encoded_samples = np.where(np.logical_or(samples == 'Red', samples == 'Black'), 1, 0) # Function that will calculate mean value of subsamples def mean_value(data, subsample_size): return data[:subsample_size].mean() # Visualizing the results x = np.arange(2000) y = np.zeros(2000) for i in range(1, 2000): y[i] = mean_value(encoded_samples, x[i]) plt.plot(x, y, label='Estimated probability') plt.xlabel('Number of elements to calculate probability') plt.ylabel('Probability of success') plt.axhline(y=probs[0]+probs[2], color='k', label='Real probability of success') plt.legend() plt.show()
Thanks for your feedback!