Course Content

Advanced Probability Theory

1. Additional Statements From The Probability Theory

Course Overview Absolutely Continuous and Discrete Random Variables Cumulative Distribution Functions and Probability Density Functions Characteristics of Random Variables Random Vectors Useful Properties of the Gaussian Distribution Challenge: Detecting Outliers Using 3-Sigma Rule

2. The Limit Theorems of Probability Theory

Law of Large Numbers Law of Large Numbers for Bernoulli Process Challenge: Estimate Mean Value Using Law of Large Numbers Central Limit Theorem Challenge: Application of the CLT to Solving Real Problem

3. Estimation of Population Parameters

General population. Samples. Population parameters.Momentum estimation. Maximum Likelihood Estimation Challenge: Estimate Parameters of Chi-square Distribution Unbiased Estimation Challenge: Checking Bias of An Estimation Using Simulation Consistent Estimation Efficient Estimation Confidence Intervals for Population Parameters Challenge: Confidence Interval for Exponential Distribution Parameter

4. Testing of Statistical Hypotheses

What is Statistic Hypothesis? Type 1 and Type 2 Errors What is P-value?Comparing Means of Two Different Datasets Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets Challenge: Resampling Approach to Compare Mean Values of the Datasets Testing the Hypothesis of Independence of Two Random Variables

Confidence Intervals for Population Parameters

In the previous chapters we considered how it is possible to estimate the parameters of the population and check the quality of the data of the estimates. But those estimates were point: we simply determined the possible value of the parameter based on the data we have. But there is another approach: we can construct a certain interval that, with some probability, covers the real value of the desired parameter. This interval is called the confidence interval. Let's look at the definition:

The principle of constructing confidence intervals is somewhat similar to the principle of constructing point estimates. We also use a certain function with our samples as arguments for this function. That we use the distribution law of this function and build an interval. But a rigorous mathematical explanation of this process can be quite complicated, so we will not stop on it in more detail.

Note

It's worth noting that there's another type of interval estimation for population parameters called the credible interval, which is constructed using the Bayesian theorem. These intervals have different interpretations:

The confidence interval is essentially an interval with random endpoints that, with a certain probability, covers the true constant value of the parameter;

In contrast, the credible interval is a constant interval where the random value of the desired parameter falls with a certain probability.

Confidence interval for Gaussian distribution expectation parameter

Let's look at how to build a confidence interval for Gaussian distribution expectation parameter. We will consider 2 different situations:

In the image above, we provided a confidence interval for Gaussian expectation if we know variance. We use the PPF of Gaussian distribution and sample to build this interval.

Then we provided a confidence interval for Gaussian expectation if we don't know the variance and used adjusted sample variance instead of known variance for estimation. We use the PPF of Student distribution with an n-1 degree of freedom to build this interval.

Confidence interval with Python

Let's now look at how to build a confidence interval for the mean value of Gaussian samples in Python. We will use different confidence levels and compare intervals built due to corresponding confidence levels.


              123456789101112131415161718192021222324252627
            
import numpy as np
from scipy import stats
import pandas as pd

# Load the dataset
samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names=['Value'])
data = np.array(samples)

# Calculate the degrees of freedom
n = len(data)
df = n - 1

# Build confidence intervals with different confidence levels
for conf_level in [0.9, 0.95, 0.99]:
    # Calculate the t-value for the given confidence level and degrees of freedom
    t_value = stats.t.ppf((1+ conf_level) / 2, df)

    # Calculate the sample mean and adjusted sample variance
    mean = np.mean(data)
    adjusted_var = np.var(data, ddof=1)

    # Calculate the lower and upper bounds of the confidence interval
    lower_bound = mean - t_value * np.sqrt(adjusted_var) / np.sqrt(n)
    upper_bound = mean + t_value * np.sqrt(adjusted_var) / np.sqrt(n)
    
    # Print the result
    print(f'{conf_level:.0%} confidence interval for mean value is: ({lower_bound:.2f}, {upper_bound:.2f})')

We see that the higher the confidence level, the wider the interval we get. This is quite logical, since the wider the interval, the higher the probability that this interval covers the real value of the mean.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 8

Ask AI

Ask anything or try one of the suggested questions to begin our chat