Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele What Is CLT? | Section
Statistics for Data Analysis

bookWhat Is CLT?

Pyyhkäise näyttääksesi valikon

The Central Limit Theorem (CLT) is a foundational concept in statistics that explains why the normal distribution appears so frequently in practice, even when the underlying data is not normally distributed.

The theorem states that if you take a large number of independent, identically distributed (i.i.d.) random variables, each with a finite mean and variance, the distribution of their sample mean will approximate a normal (bell-shaped) distribution as the sample size becomes large, regardless of the original distribution's shape.

The CLT is important because it provides a powerful bridge between probability theory and practical data analysis. Even if your data comes from a skewed or unusual distribution—such as income, waiting times, or test scores—the average of many independent samples from this data will tend to form a normal distribution as the sample size increases.

If you measure the average height of 30 randomly chosen adults many times, the histogram of these averages will look more and more like a bell curve, even if the original height data is not perfectly normal.

This convergence happens because the random "ups and downs" in each sample tend to cancel out, and extreme values become less likely when averaging. The larger your sample size, the closer the distribution of sample means gets to a true normal distribution. This is why, in practice, you can often use normal-based statistical tools even when your raw data is not normal.

For the CLT to apply, two main prerequisites must be met:

  • The variables must be independent; the outcome of one variable does not affect the others;
  • The variables must be identically distributed; each variable follows the same probability distribution with the same mean and variance.

The power of the CLT is that it allows you to use normal probability methods to analyze averages and sums, even if the data itself is not normal. This has major practical implications:

  • You can confidently apply techniques like confidence intervals, hypothesis tests, and control charts to sample means or totals;
  • The underlying assumptions will hold as long as your samples are large enough and the data meets the basic requirements.
123456789101112131415161718192021222324252627282930
import numpy as np import matplotlib.pyplot as plt # Simulate sampling from a non-normal (exponential) distribution np.random.seed(42) population = np.random.exponential(scale=2.0, size=10000) sample_size = 30 n_samples = 1000 sample_means = [] for _ in range(n_samples): sample = np.random.choice(population, size=sample_size, replace=False) sample_means.append(np.mean(sample)) plt.figure(figsize=(10, 4)) plt.subplot(1, 2, 1) plt.hist(population, bins=40, color='skyblue', edgecolor='black') plt.title("Original Exponential Distribution") plt.xlabel("Value") plt.ylabel("Frequency") plt.subplot(1, 2, 2) plt.hist(sample_means, bins=30, color='salmon', edgecolor='black') plt.title("Distribution of Sample Means") plt.xlabel("Sample Mean") plt.ylabel("Frequency") plt.tight_layout() plt.show()
copy

To connect this simulation to the Central Limit Theorem, consider the steps in the code above. First, you create a large "population" of values drawn from an exponential distribution, which is not normal and is typically right-skewed. Next, you repeatedly draw random samples of fixed size (here, 30) from this population. For each sample, you calculate its mean, collecting these means over many repetitions (here, 1,000 times).

The first histogram displays the original exponential population, clearly showing its skewed shape. The second histogram shows the distribution of the sample means. Notice how, even though the underlying population is not normal, the distribution of the sample means becomes more symmetric and bell-shaped. This transformation illustrates the Central Limit Theorem in action: as you take more samples of sufficient size, the distribution of their means approaches a normal distribution, regardless of the population's original shape.

question mark

Which of the following statements are true about the Central Limit Theorem?

Select all correct answers

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 29

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 29
some-alt