Learn Central Limit Theorem | The Limit Theorems of Probability Theory

The central limit theorem is a fundamental statistic theorem that states that the sum of a large number of independent, identically distributed random variables will be approximately normally distributed, regardless of the underlying distribution of the individual random variables.

Theorem formulation

Formal description of the theorem can be presented as follows:

As well as in the law of large numbers, we see that in the definition of the Central Limit Theorem, there is a letter 'd' above the arrow. This letter means the so-called convergence in distribution. In simple words, it can be interpreted as follows, the more terms we have, the more PDF of the sum of these terms will be similar to the PDF of Gaussian distribution.
Instead of the last line in the formulation above, another is often used:

In this formulation, we don't talk about convergence anymore. Instead, we assert that the sum follows a Gaussian distribution law with certain parameters right away. However, it's important to note that this approximation only holds for large values of n.

For each specific distribution, the required value of n differs, but generally, if n is not less than 35, this approximation works with reasonably high accuracy.

Illustration of the theorem

Take a look at the illustration below: we'll calculate the PDF of the sum of uniformly distributed variables. As shown in the illustration, the resulting PDF becomes more similar to a Gaussian PDF as we use more and more terms to calculate the sum.

Now let's look at the PMF of the sum of Binomial variables:

CLT implementation

We'll create 500 samples, each containing hundreds of random variables from an exponential distribution.
For each of these 500 samples, we'll calculate the sum of its random variables and create a histogram from the resulting 500 values. Then, we'll compare this histogram with a PDF plot of a Gaussian random variable.


              12345678910111213141516171819202122232425262728293031323334
            
import numpy as np
import matplotlib.pyplot as plt

# List to store the sum of samples from each iteration
hist_samples = []

# Generate 500 samples and calculate the sum of random variables in each sample
for i in range(500):
    generated_samples = np.random.poisson(4, 100)  # Generate 100 random variables from a Poisson distribution with mean 4
    hist_samples.append(generated_samples.sum())  # Calculate the sum and append it to hist_samples

# Plot a histogram of the samples and pdf of Gaussian distribution
fig, axes = plt.subplots(1,2)  # Create subplots
fig.set_size_inches(10, 5)  # Set the size of the figure

# Plot histogram on the first subplot
axes[0].hist(hist_samples, bins=10, alpha=0.5, edgecolor='black', density=True)
axes[0].set_title('Histogram of Sum of Poisson Values')  # Set title for the first subplot

# Parameters for Gaussian distribution
mean = 400  # Mean of one Poisson variable is 4, mean of sum is 400
std = 20    # Variance of one Poisson variable is 4, variance of sum 400, std 20

# Define the range of x values for the plot
x = np.linspace(mean - 3 * std, mean + 3 * std, 500)

# Calculate the pdf of the Gaussian distribution
pdf = (1 / (std * np.sqrt(2 * np.pi))) * np.exp(-((x - mean)**2) / (2 * std**2))

# Plot the pdf on the second subplot
axes[1].plot(x, pdf)
axes[1].set_title('Gaussian Distribution with Mean = {} and Variance = {}'.format(mean, std**2))  # Set title for the second subplot

plt.show()  # Display the plot

We can observe that the resulting histogram closely matches the PDF of the Gaussian distribution. This confirms the validity of the theorem, demonstrating its applicability in real-world scenarios!

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat