Learn Law of Large Numbers | The Limit Theorems of Probability Theory

The Law of Large Numbers is a fundamental concept in probability theory and statistics that states that as the sample size increases, the average of the observed values will converge to the expected value or mean of the underlying distribution.

Mathematical definition of the law

Let's provide some explanations of this law:

The first condition is that we have a sequence of random variables that are independent and identically distributed (i.i.d.). This means the variables are the same type and have the same distribution pattern. For instance, N(1, 2) and N(1, 3) are not identically distributed because although they're both Gaussian, they have different variances;
The second condition is that these values must have a finite expectation. This means the series or integral must converge to a specific number, as discussed in Chapter 2 of the first section;
The law of large numbers states that if the first two conditions are met, then as we take more variables, the average of these variables gets closer to the real expectation.

Note

In the law's statement, you might see the letter 'p' above the arrow. This means convergence in terms of probability, which is how random variables come together. But for understanding the law of large numbers practically, you don't need to worry about this type of convergence. So, we won't go into it in this course.

The visualization of the law

To verify if the law of large numbers holds true, run the code examples multiple times and observe if the convergence remains consistent when summing variables in different sequences. If the law is upheld, the average will consistently tend toward the actual expectation, regardless of the order in which the variables are summed.


              1234567891011121314151617181920212223242526272829
            
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Importing the dataset
samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names=['Value'])

# Shuffle the samples
samples = samples.sample(frac=1)

# Function that will calculate mean value of subsamples
def mean_value(data, subsample_size):
    return data[:subsample_size].mean()['Value']

# Visualizing the results
x = np.arange(5000)
y = np.zeros(5000)

# Loop through different subsample sizes and calculate mean
for i in range(1, 5000):
    y[i] = mean_value(samples, x[i])

# Plotting the results
plt.plot(x, y, label='Estimated mean')
plt.xlabel('Number of elements to calculate mean value')
plt.ylabel('Mean value')
plt.axhline(y=0, color='k', label='Real mean')
plt.legend()
plt.show()

We can see on the plot above the more terms we take, the closer the estimated value is to the real value: variance of the estimated value decreases.

Let's now look at the data that was obtained from the Cauchy distribution and see if the law of large numbers will work for this distribution (don't forget to run the code several times and look at the results):


              123456789101112131415161718192021222324
            
from scipy.stats import cauchy
import matplotlib.pyplot as plt
from scipy.stats import cauchy
import numpy as np

# Set the location parameter to 0 and generate 5000 samples
loc = 0
samples = cauchy.rvs(loc=loc, size=5000)
# Function that will calculate mean value of subsamples
def mean_value(data, subsample_size):
  return data[:subsample_size].mean()

# Visualizing the results
x = np.arange(5000)
y = np.zeros(5000)
for i in range(1, 5000):
  y[i] = mean_value(samples, x[i])

plt.plot(x, y, label='Estimated mean')
plt.xlabel('Number of elements to calculate mean value')
plt.ylabel('Mean value')

plt.legend()
plt.show()

In the first case, the plot always converges to zero regardless of the order of summation. The fluctuations around zero decrease as more terms are added.

However, in the second case, the plot doesn't converge and behaves unpredictably. This is because the Cauchy distribution lacks a finite mathematical expectation, violating the second condition of the law of large numbers.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat