Course Content

Probability Theory Mastering

## Probability Theory Mastering

# Cumulative Distribution Functions and Probability Density Functions

## Cumulative Distribution Function (CDF)

**The Cumulative Distribution Function (CDF)** is a function that describes the cumulative probability of a random variable taking on a value less than or equal to a given value.

Mathematically, the CDF of a random variable X, denoted as F(x), is defined as:

`F(x) = Probability that variable X is less or equal to value x`

.

Using this function, it is easy to describe continuous random variables.

Look at the example below: we will use a normally distributed random variable and look at its CDF using the `.cdf()`

method.

`import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate a random variable following a normal distribution mu = 0 # mean sigma = 1 # standard deviation x = np.linspace(-5, 5, 100) # x values rv = norm(loc=mu, scale=sigma) # create a normal distribution with given mean and standard deviation # Compute the CDF for the random variable cdf = rv.cdf(x) # Plot the CDF plt.plot(x, cdf, label='CDF') plt.xlabel('X') plt.ylabel('CDF') plt.title('CDF of a Standard Normal Distribution') plt.legend() plt.show()`

Using CDF, we can determine the probability that our random variable belongs to any of the intervals of interest. Assume that X is a random variable, and F(x) is its CDF.**To determine the probability that the variable X belongs to the interval [a, b], we can use the following formula**:

`P{X є [a,b]} = F(b) - F(a)`

.

`import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate a random variable following a normal distribution mu = 0 # mean sigma = 1 # standard deviation rv = norm(loc=mu, scale=sigma) # Calculate probabilities for different ranges print('Normally distributed variable belongs to [-1, 1] with probability:', round(rv.cdf(1) - rv.cdf(-1), 3)) print('Normally distributed variable belongs to [-2, 2] with probability:', round(rv.cdf(2) - rv.cdf(-2), 3)) print('Normally distributed variable belongs to [-3, 3] with probability:', round(rv.cdf(3) - rv.cdf(-3), 3))`

## Percent Point Function (PPF)

**Percent Point Function (PPF)**, also known as the inverse of the cumulative distribution function (CDF). It is used to find the value of a random variable that **corresponds to a given probability**. In Python it is implemented using `.ppf()`

method:

`from scipy.stats import norm # Define probabilities probabilities = [0.1, 0.5, 0.85] # Iterate over each probability and print the corresponding value of the variable for i in probabilities: # Calculate the value of the variable using the percent point function (inverse of the cumulative distribution function) value = norm.ppf(i) # Round the value to 3 decimal places for clarity value = round(value, 3) # Print the result print('Normally distributed variable is less than', value, 'with probability', i)`

## Probability Density Function (PDF)

**Probability Density Function (PDF)** is a function that provides information about the likelihood of a random variable taking on a particular value at a specific point in the continuous range. Its interpretation is similar to that of the PMF but is specifically used for describing continuous random variables.

The PDF defines the **shape of the probability distribution** of a continuous random variable.

Let's consider the following example of PDF calculated using the `.pdf()`

method.

`import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate x values for plotting x = np.linspace(-3, 3, 100) # Calculate the probability density function (PDF) values for the standard normal distribution pdf_values = norm.pdf(x, loc=0, scale=1) # Plot the PDF plt.plot(x, pdf_values, label='PDF') # Plot PDF values against x values plt.xlabel('X') # Label for x-axis plt.ylabel('PDF') # Label for y-axis plt.title('PDF of a Standard Normal Distribution') # Title of the plot plt.legend() # Show legend plt.show() # Display the plot`

The PDF provides insight into the **likelihood** or **probability density** of a random variable assuming a specific value. Higher PDF values suggest a greater likelihood, while lower values suggest a lesser likelihood.

To determine the probability of a continuous variable falling within a specific range, similar to using the PMF, we calculate the sum of the PDF for all values within that range. However, since continuous variables can have an infinite number of values within any range, we calculate the **area under the PDF curve** within the specified range instead of a simple sum.

Thanks for your feedback!