Course Content
Probability Theory Mastering
Probability Theory Mastering
Consistent Estimation
In statistics, a consistent estimation is an estimation that converges to the true value of the parameter as the sample size increases, meaning that the estimation becomes more and more accurate as more data is collected. Formally it can be described as follows:
This definition may seem rather complicated. In addition, in practice, it is not always easy to check the consistency of an estimate in this way, that is why we will introduce a simpler applied criterion of consistency:
Thus, if our estimator is asymptotically unbiased or simply unbiased and the estimator's variance decreases with increasing sample size, then such an estimator is consistent.
Let's show that the estimates of the sample mean and adjusted sample variance are consistent.
Sample mean estimation
The sample mean estimation is consistent by definition due to the law of large numbers: the more terms we include to calculate mean value, the closer the resulting value tends to the mathematical expectation.
Adjusted sample variance estimation
To check the consistency of adjusted sample variance let's use simulation:
import numpy as np import matplotlib.pyplot as plt # Generate 5000 samples from a normal distribution with mean 2 and standard deviation 2 samples = np.random.normal(2, 2, 5000) # Function to calculate adjusted variance of subsamples def adjusted_variance_value(data, subsample_size): return samples[:subsample_size].var(ddof=1) # Calculate the adjusted variance using Bessel's correction # Visualizing the results x = np.arange(2, 5000) # Generate values for the number of elements to calculate variance y = np.zeros(4998) # Initialize an array to store the calculated variances for i in range(4998): # Loop through the range of subsample sizes y[i] = adjusted_variance_value(samples, x[i]) # Calculate adjusted variance for each subsample size # Plotting the results plt.plot(x, y, label='Estimated adjusted variance') # Plot estimated adjusted variance plt.xlabel('Number of elements to calculate variance') # Set x-axis label plt.ylabel('Variance') # Set y-axis label plt.axhline(y=4, color='k', label='Real variance') # Add a horizontal line representing the real variance plt.legend() # Add legend to the plot plt.show() # Display the result
According to the visualization, we can see that as the number of elements increases, the adjusted sample variance tends to its real value, so the estimate is consistent.
Thanks for your feedback!