Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Cumulative Distribution Functions and Probability Density Functions | Additional Statements From The Probability Theory
Probability Theory Mastering
course content

Course Content

Probability Theory Mastering

Probability Theory Mastering

1. Additional Statements From The Probability Theory
2. The Limit Theorems of Probability Theory
3. Estimation of Population Parameters
4. Testing of Statistical Hypotheses

Cumulative Distribution Functions and Probability Density Functions

Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) is a function that describes the cumulative probability of a random variable taking on a value less than or equal to a given value.

Mathematically, the CDF of a random variable X, denoted as F(x), is defined as:

F(x) = Probability that variable X is less or equal to value x.

Using this function, it is easy to describe continuous random variables.
Look at the example below: we will use a normally distributed random variable and look at its CDF using the .cdf() method.

1234567891011121314151617181920
import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate a random variable following a normal distribution mu = 0 # mean sigma = 1 # standard deviation x = np.linspace(-5, 5, 100) # x values rv = norm(loc=mu, scale=sigma) # create a normal distribution with given mean and standard deviation # Compute the CDF for the random variable cdf = rv.cdf(x) # Plot the CDF plt.plot(x, cdf, label='CDF') plt.xlabel('X') plt.ylabel('CDF') plt.title('CDF of a Standard Normal Distribution') plt.legend() plt.show()
copy

Using CDF, we can determine the probability that our random variable belongs to any of the intervals of interest. Assume that X is a random variable, and F(x) is its CDF.
To determine the probability that the variable X belongs to the interval [a, b], we can use the following formula:

P{X є [a,b]} = F(b) - F(a).

12345678910111213
import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate a random variable following a normal distribution mu = 0 # mean sigma = 1 # standard deviation rv = norm(loc=mu, scale=sigma) # Calculate probabilities for different ranges print('Normally distributed variable belongs to [-1, 1] with probability:', round(rv.cdf(1) - rv.cdf(-1), 3)) print('Normally distributed variable belongs to [-2, 2] with probability:', round(rv.cdf(2) - rv.cdf(-2), 3)) print('Normally distributed variable belongs to [-3, 3] with probability:', round(rv.cdf(3) - rv.cdf(-3), 3))
copy

Percent Point Function (PPF)

Percent Point Function (PPF), also known as the inverse of the cumulative distribution function (CDF). It is used to find the value of a random variable that corresponds to a given probability. In Python it is implemented using .ppf() method:

12345678910111213
from scipy.stats import norm # Define probabilities probabilities = [0.1, 0.5, 0.85] # Iterate over each probability and print the corresponding value of the variable for i in probabilities: # Calculate the value of the variable using the percent point function (inverse of the cumulative distribution function) value = norm.ppf(i) # Round the value to 3 decimal places for clarity value = round(value, 3) # Print the result print('Normally distributed variable is less than', value, 'with probability', i)
copy

Probability Density Function (PDF)

Probability Density Function (PDF) is a function that provides information about the likelihood of a random variable taking on a particular value at a specific point in the continuous range. Its interpretation is similar to that of the PMF but is specifically used for describing continuous random variables.

The PDF defines the shape of the probability distribution of a continuous random variable.

Let's consider the following example of PDF calculated using the .pdf() method.

1234567891011121314151617
import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Generate x values for plotting x = np.linspace(-3, 3, 100) # Calculate the probability density function (PDF) values for the standard normal distribution pdf_values = norm.pdf(x, loc=0, scale=1) # Plot the PDF plt.plot(x, pdf_values, label='PDF') # Plot PDF values against x values plt.xlabel('X') # Label for x-axis plt.ylabel('PDF') # Label for y-axis plt.title('PDF of a Standard Normal Distribution') # Title of the plot plt.legend() # Show legend plt.show() # Display the plot
copy

The PDF provides insight into the likelihood or probability density of a random variable assuming a specific value. Higher PDF values suggest a greater likelihood, while lower values suggest a lesser likelihood.

To determine the probability of a continuous variable falling within a specific range, similar to using the PMF, we calculate the sum of the PDF for all values within that range. However, since continuous variables can have an infinite number of values within any range, we calculate the area under the PDF curve within the specified range instead of a simple sum.

Is the following statement true: the area under the PDF curve between two points represents the probability of the random variable falling within that range.

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 3
some-alt