Course Content

Advanced Probability Theory

1. Additional Statements From The Probability Theory

Course Overview Absolutely Continuous and Discrete Random Variables Cumulative Distribution Functions and Probability Density Functions Characteristics of Random Variables Random Vectors Useful Properties of the Gaussian Distribution Challenge: Detecting Outliers Using 3-Sigma Rule

2. The Limit Theorems of Probability Theory

Law of Large Numbers Law of Large Numbers for Bernoulli Process Challenge: Estimate Mean Value Using Law of Large Numbers Central Limit Theorem Challenge: Application of the CLT to Solving Real Problem

3. Estimation of Population Parameters

General population. Samples. Population parameters.Momentum estimation. Maximum Likelihood Estimation Challenge: Estimate Parameters of Chi-square Distribution Unbiased Estimation Challenge: Checking Bias of An Estimation Using Simulation Consistent Estimation Efficient Estimation Confidence Intervals for Population Parameters Challenge: Confidence Interval for Exponential Distribution Parameter

4. Testing of Statistical Hypotheses

What is Statistic Hypothesis? Type 1 and Type 2 Errors What is P-value?Comparing Means of Two Different Datasets Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets Challenge: Resampling Approach to Compare Mean Values of the Datasets Testing the Hypothesis of Independence of Two Random Variables

Cumulative Distribution Functions and Probability Density Functions

Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) is a function that describes the cumulative probability of a random variable taking on a value less than or equal to a given value.

Mathematically, the CDF of a random variable X, denoted as F(x), is defined as:

F(x) = Probability that variable X is less or equal to value x.

Using this function, it is easy to describe continuous random variables.
Look at the example below: we will use a normally distributed random variable and look at its CDF using the .cdf() method.


              1234567891011121314151617181920
            
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate a random variable following a normal distribution
mu = 0  # mean
sigma = 1  # standard deviation
x = np.linspace(-5, 5, 100)  # x values
rv = norm(loc=mu, scale=sigma)  # create a normal distribution with given mean and standard deviation

# Compute the CDF for the random variable
cdf = rv.cdf(x)

# Plot the CDF
plt.plot(x, cdf, label='CDF')
plt.xlabel('X')
plt.ylabel('CDF')
plt.title('CDF of a Standard Normal Distribution')
plt.legend()
plt.show()

Using CDF, we can determine the probability that our random variable belongs to any of the intervals of interest. Assume that X is a random variable, and F(x) is its CDF.
To determine the probability that the variable X belongs to the interval [a, b], we can use the following formula:

P{X є [a,b]} = F(b) - F(a).


              12345678910111213
            
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate a random variable following a normal distribution
mu = 0  # mean
sigma = 1  # standard deviation
rv = norm(loc=mu, scale=sigma) 

# Calculate probabilities for different ranges
print('Normally distributed variable belongs to [-1, 1] with probability:', round(rv.cdf(1) - rv.cdf(-1), 3))
print('Normally distributed variable belongs to [-2, 2] with probability:', round(rv.cdf(2) - rv.cdf(-2), 3))
print('Normally distributed variable belongs to [-3, 3] with probability:', round(rv.cdf(3) - rv.cdf(-3), 3))

Percent Point Function (PPF)

Percent Point Function (PPF), also known as the inverse of the cumulative distribution function (CDF). It is used to find the value of a random variable that corresponds to a given probability. In Python it is implemented using .ppf() method:


              12345678910111213
            
from scipy.stats import norm

# Define probabilities
probabilities = [0.1, 0.5, 0.85]

# Iterate over each probability and print the corresponding value of the variable
for i in probabilities:
    # Calculate the value of the variable using the percent point function (inverse of the cumulative distribution function)
    value = norm.ppf(i)
    # Round the value to 3 decimal places for clarity
    value = round(value, 3)
    # Print the result
    print('Normally distributed variable is less than', value, 'with probability', i)

Probability Density Function (PDF)

Probability Density Function (PDF) is a function that provides information about the likelihood of a random variable taking on a particular value at a specific point in the continuous range. Its interpretation is similar to that of the PMF but is specifically used for describing continuous random variables.

The PDF defines the shape of the probability distribution of a continuous random variable.

Let's consider the following example of PDF calculated using the .pdf() method.


              1234567891011121314151617
            
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate x values for plotting
x = np.linspace(-3, 3, 100)

# Calculate the probability density function (PDF) values for the standard normal distribution
pdf_values = norm.pdf(x, loc=0, scale=1)

# Plot the PDF
plt.plot(x, pdf_values, label='PDF')  # Plot PDF values against x values
plt.xlabel('X')  # Label for x-axis
plt.ylabel('PDF')  # Label for y-axis
plt.title('PDF of a Standard Normal Distribution')  # Title of the plot
plt.legend()  # Show legend
plt.show()  # Display the plot

The PDF provides insight into the likelihood or probability density of a random variable assuming a specific value. Higher PDF values suggest a greater likelihood, while lower values suggest a lesser likelihood.

To determine the probability of a continuous variable falling within a specific range, similar to using the PMF, we calculate the sum of the PDF for all values within that range. However, since continuous variables can have an infinite number of values within any range, we calculate the area under the PDF curve within the specified range instead of a simple sum.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat