Contenido del Curso
Advanced Probability Theory
Advanced Probability Theory
Confidence Intervals for Population Parameters
In the previous chapters we considered how it is possible to estimate the parameters of the population and check the quality of the data of the estimates. But those estimates were point: we simply determined the possible value of the parameter based on the data we have. But there is another approach: we can construct a certain interval that, with some probability, covers the real value of the desired parameter. This interval is called the confidence interval. Let's look at the definition:
The principle of constructing confidence intervals is somewhat similar to the principle of constructing point estimates. We also use a certain function with our samples as arguments for this function. That we use the distribution law of this function and build an interval. But a rigorous mathematical explanation of this process can be quite complicated, so we will not stop on it in more detail.
Note
It's worth noting that there's another type of interval estimation for population parameters called the credible interval, which is constructed using the Bayesian theorem. These intervals have different interpretations:
The confidence interval is essentially an interval with random endpoints that, with a certain probability, covers the true constant value of the parameter;
In contrast, the credible interval is a constant interval where the random value of the desired parameter falls with a certain probability.
Confidence interval for Gaussian distribution expectation parameter
Let's look at how to build a confidence interval for Gaussian distribution expectation parameter. We will consider 2 different situations:
In the image above, we provided a confidence interval for Gaussian expectation if we know variance. We use the PPF of Gaussian distribution and sample to build this interval.
Then we provided a confidence interval for Gaussian expectation if we don't know the variance and used adjusted sample variance instead of known variance for estimation. We use the PPF of Student distribution with an n-1
degree of freedom to build this interval.
Confidence interval with Python
Let's now look at how to build a confidence interval for the mean value of Gaussian samples in Python. We will use different confidence levels and compare intervals built due to corresponding confidence levels.
import numpy as np from scipy import stats import pandas as pd # Load the dataset samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names=['Value']) data = np.array(samples) # Calculate the degrees of freedom n = len(data) df = n - 1 # Build confidence intervals with different confidence levels for conf_level in [0.9, 0.95, 0.99]: # Calculate the t-value for the given confidence level and degrees of freedom t_value = stats.t.ppf((1+ conf_level) / 2, df) # Calculate the sample mean and adjusted sample variance mean = np.mean(data) adjusted_var = np.var(data, ddof=1) # Calculate the lower and upper bounds of the confidence interval lower_bound = mean - t_value * np.sqrt(adjusted_var) / np.sqrt(n) upper_bound = mean + t_value * np.sqrt(adjusted_var) / np.sqrt(n) # Print the result print(f'{conf_level:.0%} confidence interval for mean value is: ({lower_bound:.2f}, {upper_bound:.2f})')
We see that the higher the confidence level, the wider the interval we get. This is quite logical, since the wider the interval, the higher the probability that this interval covers the real value of the mean.
¡Gracias por tus comentarios!