Aprende Sampling and Central Limit Theorem

Desliza para mostrar el menú

Sampling is a core practice in statistics because you rarely have access to data on an entire population. Instead, you collect a subset, or sample, and use it to make inferences about the whole. The process of sampling introduces variability, but careful sampling methods help ensure that your sample represents the population well.

The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that if you repeatedly take random samples of a given size from any population (regardless of the population's distribution), the distribution of the sample means will tend to be approximately normal (bell-shaped) as the sample size grows, provided the population has a finite mean and variance. This phenomenon holds true even if the population itself is not normally distributed.

The implications of the Central Limit Theorem are far-reaching. It allows you to use normal probability theory to make inferences about sample means, even when the underlying data are not normal. This is why the normal distribution appears so often in statistics and why sample means are so useful for estimating population parameters. The CLT justifies the use of confidence intervals and hypothesis tests, which rely on the normality of sample statistics when the sample size is large enough.


              123456789101112131415161718192021222324252627282930313233
            
import numpy as np
import matplotlib.pyplot as plt

# Simulate a non-normal population (exponential distribution)
population = np.random.exponential(scale=2, size=100000)

# Parameters for sampling
sample_size = 30
num_samples = 1000
sample_means = []

# Take repeated samples and compute their means
for _ in range(num_samples):
    sample = np.random.choice(population, size=sample_size, replace=False)
    sample_means.append(np.mean(sample))

# Visualize the population and the distribution of sample means
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Population distribution
axes[0].hist(population, bins=50, color="skyblue", edgecolor="black")
axes[0].set_title("Population Distribution (Exponential)")
axes[0].set_xlabel("Value")
axes[0].set_ylabel("Frequency")

# Distribution of sample means
axes[1].hist(sample_means, bins=30, color="salmon", edgecolor="black")
axes[1].set_title("Distribution of Sample Means (n=30)")
axes[1].set_xlabel("Sample Mean")
axes[1].set_ylabel("Frequency")

plt.tight_layout()
plt.show()

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 3

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 3