Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Central Limit Theorem | The Limit Theorems of Probability Theory
Probability Theory Mastering
course content

Course Content

Probability Theory Mastering

Probability Theory Mastering

1. Additional Statements From The Probability Theory
2. The Limit Theorems of Probability Theory
3. Estimation of Population Parameters
4. Testing of Statistical Hypotheses

bookCentral Limit Theorem

The central limit theorem is a fundamental statistic theorem that states that the sum of a large number of independent, identically distributed random variables will be approximately normally distributed, regardless of the underlying distribution of the individual random variables.

Theorem formulation

Formal description of the theorem can be presented as follows:

As well as in the law of large numbers, we see that in the definition of the Central Limit Theorem, there is a letter 'd' above the arrow. This letter means the so-called convergence in distribution. In simple words, it can be interpreted as follows, the more terms we have, the more PDF of the sum of these terms will be similar to the PDF of Gaussian distribution.
Instead of the last line in the formulation above, another is often used:

In this formulation, we don't talk about convergence anymore. Instead, we assert that the sum follows a Gaussian distribution law with certain parameters right away. However, it's important to note that this approximation only holds for large values of n.

For each specific distribution, the required value of n differs, but generally, if n is not less than 35, this approximation works with reasonably high accuracy.

Illustration of the theorem

Take a look at the illustration below: we'll calculate the PDF of the sum of uniformly distributed variables. As shown in the illustration, the resulting PDF becomes more similar to a Gaussian PDF as we use more and more terms to calculate the sum.

Now let's look at the PMF of the sum of Binomial variables:

CLT implementation

We'll create 500 samples, each containing hundreds of random variables from an exponential distribution.
For each of these 500 samples, we'll calculate the sum of its random variables and create a histogram from the resulting 500 values. Then, we'll compare this histogram with a PDF plot of a Gaussian random variable.

12345678910111213141516171819202122232425262728293031323334
import numpy as np import matplotlib.pyplot as plt # List to store the sum of samples from each iteration hist_samples = [] # Generate 500 samples and calculate the sum of random variables in each sample for i in range(500): generated_samples = np.random.poisson(4, 100) # Generate 100 random variables from a Poisson distribution with mean 4 hist_samples.append(generated_samples.sum()) # Calculate the sum and append it to hist_samples # Plot a histogram of the samples and pdf of Gaussian distribution fig, axes = plt.subplots(1,2) # Create subplots fig.set_size_inches(10, 5) # Set the size of the figure # Plot histogram on the first subplot axes[0].hist(hist_samples, bins=10, alpha=0.5, edgecolor='black', density=True) axes[0].set_title('Histogram of Sum of Poisson Values') # Set title for the first subplot # Parameters for Gaussian distribution mean = 400 # Mean of one Poisson variable is 4, mean of sum is 400 std = 20 # Variance of one Poisson variable is 4, variance of sum 400, std 20 # Define the range of x values for the plot x = np.linspace(mean - 3 * std, mean + 3 * std, 500) # Calculate the pdf of the Gaussian distribution pdf = (1 / (std * np.sqrt(2 * np.pi))) * np.exp(-((x - mean)**2) / (2 * std**2)) # Plot the pdf on the second subplot axes[1].plot(x, pdf) axes[1].set_title('Gaussian Distribution with Mean = {} and Variance = {}'.format(mean, std**2)) # Set title for the second subplot plt.show() # Display the plot
copy

We can observe that the resulting histogram closely matches the PDF of the Gaussian distribution. This confirms the validity of the theorem, demonstrating its applicability in real-world scenarios!

What does the letter 'd' above the arrow mean in the definition of the Central Limit Theorem?

What does the letter 'd' above the arrow mean in the definition of the Central Limit Theorem?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
some-alt