Oppiskele The Central Limit Theorem in Practice

Pyyhkäise näyttääksesi valikon

The Central Limit Theorem (CLT) is a cornerstone of inferential statistics. It states that, regardless of the shape of the population distribution, the distribution of the sample means will approach a normal distribution as the sample size increases, provided the samples are independent and identically distributed with finite variance. This remarkable property allows you to use normal-based inference methods even when the underlying data are not normally distributed. The CLT justifies the widespread use of confidence intervals and hypothesis tests based on the normal distribution, making it essential for practical data analysis.


              123456789101112131415161718192021222324252627282930313233343536373839
            
library(ggplot2)

# Set seed for reproducibility
set.seed(123)

# Generate a non-normal population
population <- rexp(10000, rate = 1)

# Parameters for simulation
sample_size <- 30
num_samples <- 1000

# Draw repeated samples and compute their means
sample_means <- replicate(
  num_samples,
  mean(sample(population, sample_size, replace = TRUE))
)

# Convert to data frames
population_df <- data.frame(value = population)
sample_means_df <- data.frame(mean_value = sample_means)

# Population distribution
ggplot(population_df, aes(x = value)) +
  geom_histogram(bins = 40) +
  labs(
    title = "Population (Exponential)",
    x = "Value",
    y = "Count"
  )

# Distribution of sample means
ggplot(sample_means_df, aes(x = mean_value)) +
  geom_histogram(bins = 40) +
  labs(
    title = "Sample Means (n = 30)",
    x = "Mean Value",
    y = "Count"
  )

As you can see from the simulation, the original population is highly skewed because it follows an exponential distribution. However, after repeatedly sampling and calculating the means, the distribution of those sample means looks much more symmetric and bell-shaped. This demonstrates the Central Limit Theorem in action: even when the underlying data are not normal, the means of sufficiently large random samples tend to be normally distributed. This property enables you to make reliable inferences about population parameters using normal-based statistical methods, as long as the sample size is reasonably large and the assumptions of independence and finite variance are met.

Oliko kaikki selvää?

Kiitos palautteestasi!

Osio 1. Luku 3

Kysy tekoälyä

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 3