Contenido del Curso

Advanced Probability Theory

1. Additional Statements From The Probability Theory

Course Overview Absolutely Continuous and Discrete Random Variables Cumulative Distribution Functions and Probability Density Functions Characteristics of Random Variables Random Vectors Useful Properties of the Gaussian Distribution Challenge: Detecting Outliers Using 3-Sigma Rule

2. The Limit Theorems of Probability Theory

Law of Large Numbers Law of Large Numbers for Bernoulli Process Challenge: Estimate Mean Value Using Law of Large Numbers Central Limit Theorem Challenge: Application of the CLT to Solving Real Problem

3. Estimation of Population Parameters

General population. Samples. Population parameters.Momentum estimation. Maximum Likelihood Estimation Challenge: Estimate Parameters of Chi-square Distribution Unbiased Estimation Challenge: Checking Bias of An Estimation Using Simulation Consistent Estimation Efficient Estimation Confidence Intervals for Population Parameters Challenge: Confidence Interval for Exponential Distribution Parameter

4. Testing of Statistical Hypotheses

What is Statistic Hypothesis? Type 1 and Type 2 Errors What is P-value?Comparing Means of Two Different Datasets Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets Challenge: Resampling Approach to Compare Mean Values of the Datasets Testing the Hypothesis of Independence of Two Random Variables

Law of Large Numbers

The Law of Large Numbers is a fundamental concept in probability theory and statistics that states that as the sample size increases, the average of the observed values will converge to the expected value or mean of the underlying distribution.

Mathematical definition of the law

Let's provide some explanations of this law:

The first condition is that we have a sequence of random variables that are independent and identically distributed (i.i.d.). This means the variables are the same type and have the same distribution pattern. For instance, N(1, 2) and N(1, 3) are not identically distributed because although they're both Gaussian, they have different variances;
The second condition is that these values must have a finite expectation. This means the series or integral must converge to a specific number, as discussed in Chapter 2 of the first section;
The law of large numbers states that if the first two conditions are met, then as we take more variables, the average of these variables gets closer to the real expectation.

Note

In the law's statement, you might see the letter 'p' above the arrow. This means convergence in terms of probability, which is how random variables come together. But for understanding the law of large numbers practically, you don't need to worry about this type of convergence. So, we won't go into it in this course.

The visualization of the law

To verify if the law of large numbers holds true, run the code examples multiple times and observe if the convergence remains consistent when summing variables in different sequences. If the law is upheld, the average will consistently tend toward the actual expectation, regardless of the order in which the variables are summed.


              1234567891011121314151617181920212223242526272829
            
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Importing the dataset
samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names=['Value'])

# Shuffle the samples
samples = samples.sample(frac=1)

# Function that will calculate mean value of subsamples
def mean_value(data, subsample_size):
    return data[:subsample_size].mean()['Value']

# Visualizing the results
x = np.arange(5000)
y = np.zeros(5000)

# Loop through different subsample sizes and calculate mean
for i in range(1, 5000):
    y[i] = mean_value(samples, x[i])

# Plotting the results
plt.plot(x, y, label='Estimated mean')
plt.xlabel('Number of elements to calculate mean value')
plt.ylabel('Mean value')
plt.axhline(y=0, color='k', label='Real mean')
plt.legend()
plt.show()

We can see on the plot above the more terms we take, the closer the estimated value is to the real value: variance of the estimated value decreases.

Let's now look at the data that was obtained from the Cauchy distribution and see if the law of large numbers will work for this distribution (don't forget to run the code several times and look at the results):


              123456789101112131415161718192021222324
            
from scipy.stats import cauchy
import matplotlib.pyplot as plt
from scipy.stats import cauchy
import numpy as np

# Set the location parameter to 0 and generate 5000 samples
loc = 0
samples = cauchy.rvs(loc=loc, size=5000)
# Function that will calculate mean value of subsamples
def mean_value(data, subsample_size):
  return data[:subsample_size].mean()

# Visualizing the results
x = np.arange(5000)
y = np.zeros(5000)
for i in range(1, 5000):
  y[i] = mean_value(samples, x[i])

plt.plot(x, y, label='Estimated mean')
plt.xlabel('Number of elements to calculate mean value')
plt.ylabel('Mean value')

plt.legend()
plt.show()

In the first case, the plot always converges to zero regardless of the order of summation. The fluctuations around zero decrease as more terms are added.

However, in the second case, the plot doesn't converge and behaves unpredictably. This is because the Cauchy distribution lacks a finite mathematical expectation, violating the second condition of the law of large numbers.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla