Learn Understading Sampling | Probability & Statistics

Definition

Sampling is the process of selecting a subset of data from a larger population to gain insights and make inferences about the whole. Since it is often impractical or impossible to collect data from an entire population, sampling allows for efficient analysis while maintaining the quality and accuracy of the results.

Simple Random Sampling

Every member of the population has an equal chance of being selected.
This is like drawing names out of a hat.

P(\text{Select any individual}) = \frac{1}{N}

Where:

$N$ = population size.

Example 1:

You have a class of 30 students. You want to randomly select 5 for a survey.

Solution: Use a random number generator to select 5 unique numbers between 1 and 30. Each student has a $\tfrac{\raisebox{1pt}{$1$}}{\raisebox{-1pt}{$30$}}$ chance of being selected.

Example 2:

You have a class of 30 students and want to select 5 to participate in a survey.

Total population: $N=30$ ;
Sample size: $n=5$ .

What is the probability that Alice and Bob are both selected?

Total number of ways to choose 5 students from 30:

\binom{30}{5}

Number of favorable samples containing both Alice and Bob:
Fix Alice and Bob — choose 3 more from the remaining 28:

\binom{28}{3}

So the probability is:

P = \frac{\binom{28}{3}}{\binom{30}{5}}

Stratified Sampling

The population is divided into meaningful subgroups (strata), and random samples are taken from each.

n_h = \frac{N_h}{N} \times n

Where:

$N_h$ - size of subgroup $h$ ;
$N$ - total population size;
$n$ - total sample size;
$n_{\raisebox{-1pt}{$h$}}$ - sample size from subgroup $h$ .

Example:

A class has 30 students: 18 males and 12 females. You want to sample 10 students proportionally:

From males: $\tfrac{\raisebox{1pt}{$18$}}{\raisebox{-1pt}{$30$}} \times 10 = 6$ ;
From females: $\tfrac{\raisebox{1pt}{$12$}}{\raisebox{-1pt}{$30$}} \times 10 = 4$ .

Why it's good: Ensures representation of key subgroups.

Cluster Sampling

The population is split into groups (clusters), and entire clusters are randomly selected.

c = \text{number of clusters to sample}

Where:

Clusters are pre-existing groups (e.g., classrooms, teams);
You randomly pick entire clusters, not individuals.

Example 1:

Your school has 5 classrooms. You want a sample of 25 students, but surveying individuals is too time-consuming.

Solution: Randomly select 1 classroom (since each has ~25 students) and survey all.

Example 2:

A university has 20 dorm buildings, each housing 50 students. You randomly select 4 dorms and survey everyone inside.

Number of clusters: $N=20$ ;
Selected clusters: $n=4$ ;
Students per dorm: $M=50$ ;
Total students sampled: $n \times M = 200$ .

What's the probability that a specific student (e.g., Sarah) is included?
It equals the probability that her dorm is selected:

P(\text{Sarah selected}) = \frac{4}{20} = 0.2

Complex case:
If 10 dorms have 30 students and 10 have 70 students, and you select 4 dorms randomly, what's the expected sample size?

Let:

$D_{30} = 10$ dorms with 30 students;
$D_{70} = 10$ dorms with 70 students.

Expected sample size:

E = \frac{10}{20} \cdot (4 \times 30) + \frac{10}{20} \cdot (4 \times 70) = 200

So even if clusters differ in size, the expected sample size remains the same if dorm types are balanced.

Systematic Sampling

Select every $k$ -th item from a list.

k = \frac{N}{n}

Where:

$N$ - total population;
$n$ - sample size desired;
$k$ - sampling interval.

Example:

A list of 1000 customers. You want a sample of 100. So:

k = \frac{1000}{100} = 10

Pick a random start point (e.g., 7), then select every 10th customer: 7, 17, 27, etc.

Why it's good: Easy to implement and systematic.

All Methods Applied to One Problem

Problem Setup:
You're studying cafeteria satisfaction at a school with 300 students across 10 classrooms (30 per room). You want a sample of 30 students.

Simple random: randomly pick 30 names from the full list;
Stratified: if 60% are boys and 40% girls, sample 18 boys and 12 girls;
Cluster: randomly select 1 class (30 students) and survey all;
Systematic: pick every 10th student from an ordered list.

Summary

Sampling reduces data collection effort while allowing generalization;
Random and stratified sampling are best for accuracy;
Cluster sampling is efficient but works best when clusters are similar;
Systematic sampling is simple and practical;
Convenience sampling is risky and should be avoided when possible;
Always document your sampling method in real-world analysis.

Everything was clear?

Thanks for your feedback!

Section 5. Chapter 5

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the differences between these sampling methods in more detail?

When should I use each sampling method?

Can you provide more real-world examples for each sampling method?

Swipe to show menu

Definition

Simple Random Sampling

Every member of the population has an equal chance of being selected.
This is like drawing names out of a hat.

P(\text{Select any individual}) = \frac{1}{N}

Where:

$N$ = population size.

Example 1:

You have a class of 30 students. You want to randomly select 5 for a survey.

Solution: Use a random number generator to select 5 unique numbers between 1 and 30. Each student has a $\tfrac{\raisebox{1pt}{$1$}}{\raisebox{-1pt}{$30$}}$ chance of being selected.

Example 2:

You have a class of 30 students and want to select 5 to participate in a survey.

Total population: $N=30$ ;
Sample size: $n=5$ .

What is the probability that Alice and Bob are both selected?

Total number of ways to choose 5 students from 30:

\binom{30}{5}

Number of favorable samples containing both Alice and Bob:
Fix Alice and Bob — choose 3 more from the remaining 28:

\binom{28}{3}

So the probability is:

P = \frac{\binom{28}{3}}{\binom{30}{5}}

Stratified Sampling

The population is divided into meaningful subgroups (strata), and random samples are taken from each.

n_h = \frac{N_h}{N} \times n

Where:

$N_h$ - size of subgroup $h$ ;
$N$ - total population size;
$n$ - total sample size;
$n_{\raisebox{-1pt}{$h$}}$ - sample size from subgroup $h$ .

Example:

A class has 30 students: 18 males and 12 females. You want to sample 10 students proportionally:

From males: $\tfrac{\raisebox{1pt}{$18$}}{\raisebox{-1pt}{$30$}} \times 10 = 6$ ;
From females: $\tfrac{\raisebox{1pt}{$12$}}{\raisebox{-1pt}{$30$}} \times 10 = 4$ .

Why it's good: Ensures representation of key subgroups.

Cluster Sampling

The population is split into groups (clusters), and entire clusters are randomly selected.

c = \text{number of clusters to sample}

Where:

Clusters are pre-existing groups (e.g., classrooms, teams);
You randomly pick entire clusters, not individuals.

Example 1:

Your school has 5 classrooms. You want a sample of 25 students, but surveying individuals is too time-consuming.

Solution: Randomly select 1 classroom (since each has ~25 students) and survey all.

Example 2:

A university has 20 dorm buildings, each housing 50 students. You randomly select 4 dorms and survey everyone inside.

Number of clusters: $N=20$ ;
Selected clusters: $n=4$ ;
Students per dorm: $M=50$ ;
Total students sampled: $n \times M = 200$ .

What's the probability that a specific student (e.g., Sarah) is included?
It equals the probability that her dorm is selected:

P(\text{Sarah selected}) = \frac{4}{20} = 0.2

Complex case:
If 10 dorms have 30 students and 10 have 70 students, and you select 4 dorms randomly, what's the expected sample size?

Let:

$D_{30} = 10$ dorms with 30 students;
$D_{70} = 10$ dorms with 70 students.

Expected sample size:

E = \frac{10}{20} \cdot (4 \times 30) + \frac{10}{20} \cdot (4 \times 70) = 200

So even if clusters differ in size, the expected sample size remains the same if dorm types are balanced.

Systematic Sampling

Select every $k$ -th item from a list.

k = \frac{N}{n}

Where:

$N$ - total population;
$n$ - sample size desired;
$k$ - sampling interval.

Example:

A list of 1000 customers. You want a sample of 100. So:

k = \frac{1000}{100} = 10

Pick a random start point (e.g., 7), then select every 10th customer: 7, 17, 27, etc.

Why it's good: Easy to implement and systematic.

All Methods Applied to One Problem

Problem Setup:
You're studying cafeteria satisfaction at a school with 300 students across 10 classrooms (30 per room). You want a sample of 30 students.

Simple random: randomly pick 30 names from the full list;
Stratified: if 60% are boys and 40% girls, sample 18 boys and 12 girls;
Cluster: randomly select 1 class (30 students) and survey all;
Systematic: pick every 10th student from an ordered list.

Summary

Sampling reduces data collection effort while allowing generalization;
Random and stratified sampling are best for accuracy;
Cluster sampling is efficient but works best when clusters are similar;
Systematic sampling is simple and practical;
Convenience sampling is risky and should be avoided when possible;
Always document your sampling method in real-world analysis.

Everything was clear?

Thanks for your feedback!

Section 5. Chapter 5