Conteúdo do Curso
Advanced Probability Theory
Advanced Probability Theory
Useful Properties of the Gaussian Distribution
The Gaussian distribution (also called normal distribution) is one of the most important distributions in probability theory and statistics. Now we will look at some useful properties of this distribution and understand why it is so important and how it is applied in real life.
Physical meaning of the Gaussian distribution
The Gaussian distribution can describe a random variable that results from many different factors adding up.
For example, when weighing something, various factors like temperature, pressure, and measurement errors affect the result. Individually, these factors don't matter much, but together they have a significant impact. This is explained further in the chapter on the Central Limit Theorem.
Let's see how we will denote the Gaussian quantities in the future:
Linear transformations of Gaussian vectors
Gaussian distribution is preserved under linear transformations of random variables: if we apply a linear transformation to a Gaussian value, we will also get a Gaussian value at the output, but with different characteristics.
Uncorrelated Gaussian variables are independent
We know that correlation shows only the presence of linear dependencies between variables: as a result variables can be dependent but not correlated. But in the case of Gaussian variables, zero correlation means that the variables are independent, which is also a very useful property of Gaussian distribution.
3-sigma rule
The 3-sigma rule, also known as the empirical rule or the 68-95-99.7 rule, is a statistical guideline that states that for a normal distribution:
- Approximately
68%
of the data falls within one standard deviation (σ
) of the mean (μ
); - Approximately
95%
of the data falls within two standard deviations (2σ
) of the mean (μ
); - Approximately
99.7%
of the data falls within three standard deviations (3σ
) of the mean (μ
). This rule can be very useful for detecting outliers for the data that has Gaussian distribution.
import numpy as np from scipy.stats import norm import matplotlib.pyplot as plt # Generate some data from a normal distribution mu = 0 # mean sigma = 1 # standard deviation x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100) y = norm.pdf(x, mu, sigma) # Plot the PDF of the normal distribution plt.plot(x, y, label='PDF') # Shade the area within 1, 2, and 3 standard deviations of the mean plt.fill_between(x, 0, y, where=(x >= mu-sigma) & (x <= mu+sigma), alpha=0.3, label='68%') plt.fill_between(x, 0, y, where=(x >= mu-2*sigma) & (x <= mu+2*sigma), alpha=0.3, label='95%') plt.fill_between(x, 0, y, where=(x >= mu-3*sigma) & (x <= mu+3*sigma), alpha=0.3, label='99.7%') # Add a legend and labels plt.legend() plt.xlabel('X') plt.ylabel('PDF') plt.title('3-Sigma Rule for a Gaussian Distribution') # Show the plot plt.show()
Obrigado pelo seu feedback!