Course Content

Probability Theory Mastering

## Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

# Central Limit Theorem

**The central limit theorem** is a fundamental statistic theorem that states that the sum of a large number of independent, identically distributed random variables will be approximately normally distributed, regardless of the underlying distribution of the individual random variables.

## Theorem formulation

Formal description of the theorem can be presented as follows:

As well as in the law of large numbers, we see that in the definition of the Central Limit Theorem, there is a letter 'd' above the arrow. This letter means the so-called **convergence in distribution**. In simple words, it can be interpreted as follows, the more terms we have, the more PDF of the sum of these terms will be similar to the PDF of Gaussian distribution.

Instead of the last line in the formulation above, another is often used:

In this formulation, we don't talk about convergence anymore. Instead, we assert that the sum follows a Gaussian distribution law with certain parameters right away. However, it's important to note that this approximation **only holds for large values of n**.

For each specific distribution, the required value of n differs, but generally, if n **is not less than** `35`

, this approximation works with reasonably high accuracy.

## Illustration of the theorem

Take a look at the illustration below: we'll calculate the PDF of the **sum of uniformly distributed variables**. As shown in the illustration, the resulting PDF becomes more similar to a Gaussian PDF as we use more and more terms to calculate the sum.

Now let's look at the PMF of the sum of Binomial variables:

## CLT implementation

We'll create `500`

samples, each containing hundreds of random variables from an exponential distribution.

For each of these `500`

samples, we'll calculate the sum of its random variables and create a histogram from the resulting `500`

values. Then, we'll compare this histogram with a PDF plot of a Gaussian random variable.

We can observe that the resulting histogram **closely matches** the PDF of the Gaussian distribution. This confirms the validity of the theorem, demonstrating its applicability in real-world scenarios!

Everything was clear?