Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Sample Size Basics | A/B Testing Foundations
Applied Hypothesis Testing & A/B Testing

bookSample Size Basics

Understanding how to determine the appropriate sample size is a crucial foundation for reliable A/B testing. The sample size you choose directly affects the validity of your experiment’s results. If your sample is too small, you may not have enough data to detect a meaningful difference between your control and treatment groups, even if one exists. On the other hand, using an excessively large sample can waste resources and time. The right sample size balances efficiency with the ability to draw trustworthy conclusions, which is closely related to a concept called statistical power.

Several key factors influence how you determine the correct sample size for an A/B test:

  • Effect size: the minimum difference between groups that you want to be able to detect. Smaller effect sizes require larger sample sizes to detect reliably;
  • Significance level (alpha): the probability of incorrectly concluding that a difference exists when it does not (a false positive). Commonly, alpha is set to 0.05;
  • Statistical power (1 - beta): the probability of correctly detecting a real difference. Higher power (typically 0.8 or 80% and above) requires larger sample sizes;
  • Variability: the amount of natural variation in your data, often measured by standard deviation. More variability means you need a larger sample to distinguish real effects from random noise.

In practice, you can use formulas or online calculators to estimate the required sample size. For comparing two proportions (such as conversion rates), a common formula is:

n=2×[(Z1α/2+Z1β)2×p×(1p)]/d2n = 2 × [(Z_{1-\alpha/2} + Z_{1-\beta})^2 × p × (1 - p)] / d^2

Where:

  • nn is the sample size per group;
  • Z1α/2Z_{1-\alpha/2} is the z-score for your chosen significance level;
  • Z1βZ_{1-\beta} is the z-score for your chosen power;
  • pp is the estimated baseline conversion rate;
  • dd is the minimum detectable effect (difference in conversion rate you care about).

Example: if your current conversion rate is 10%, you want to detect a 2% absolute increase, with 80% power and a 5% significance level, you would plug these values into the formula to calculate the required sample size.

You can also use Python libraries like scipy.stats to perform these calculations programmatically. This ensures your A/B test is designed to deliver reliable, actionable results.

12345678910111213141516171819202122232425
from scipy.stats import norm import math # Set parameters for the A/B test baseline_rate = 0.10 # current conversion rate (10%) min_effect = 0.02 # minimum detectable effect (2%) alpha = 0.05 # significance level (5%) power = 0.8 # desired statistical power (80%) # Calculate z-scores for alpha and power z_alpha = norm.ppf(1 - alpha / 2) z_beta = norm.ppf(power) # Average conversion rate under null hypothesis p = baseline_rate d = min_effect # Sample size formula for two proportions (per group) n = 2 * ((z_alpha + z_beta)**2) * p * (1 - p) / d**2 # Round up to nearest whole number n = math.ceil(n) print(f"Required sample size per group: {n}")
copy
Note
Definition

Statistical power is the probability that an experiment will detect an effect when there is one to be found. In A/B testing, high statistical power means you are more likely to observe real differences between variants, reducing the risk of missing a true improvement or change.

question mark

What can happen if you run an A/B test with a sample size that is too small (i.e., the experiment is underpowered)?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 4

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 3.23

bookSample Size Basics

Swipe um das Menü anzuzeigen

Understanding how to determine the appropriate sample size is a crucial foundation for reliable A/B testing. The sample size you choose directly affects the validity of your experiment’s results. If your sample is too small, you may not have enough data to detect a meaningful difference between your control and treatment groups, even if one exists. On the other hand, using an excessively large sample can waste resources and time. The right sample size balances efficiency with the ability to draw trustworthy conclusions, which is closely related to a concept called statistical power.

Several key factors influence how you determine the correct sample size for an A/B test:

  • Effect size: the minimum difference between groups that you want to be able to detect. Smaller effect sizes require larger sample sizes to detect reliably;
  • Significance level (alpha): the probability of incorrectly concluding that a difference exists when it does not (a false positive). Commonly, alpha is set to 0.05;
  • Statistical power (1 - beta): the probability of correctly detecting a real difference. Higher power (typically 0.8 or 80% and above) requires larger sample sizes;
  • Variability: the amount of natural variation in your data, often measured by standard deviation. More variability means you need a larger sample to distinguish real effects from random noise.

In practice, you can use formulas or online calculators to estimate the required sample size. For comparing two proportions (such as conversion rates), a common formula is:

n=2×[(Z1α/2+Z1β)2×p×(1p)]/d2n = 2 × [(Z_{1-\alpha/2} + Z_{1-\beta})^2 × p × (1 - p)] / d^2

Where:

  • nn is the sample size per group;
  • Z1α/2Z_{1-\alpha/2} is the z-score for your chosen significance level;
  • Z1βZ_{1-\beta} is the z-score for your chosen power;
  • pp is the estimated baseline conversion rate;
  • dd is the minimum detectable effect (difference in conversion rate you care about).

Example: if your current conversion rate is 10%, you want to detect a 2% absolute increase, with 80% power and a 5% significance level, you would plug these values into the formula to calculate the required sample size.

You can also use Python libraries like scipy.stats to perform these calculations programmatically. This ensures your A/B test is designed to deliver reliable, actionable results.

12345678910111213141516171819202122232425
from scipy.stats import norm import math # Set parameters for the A/B test baseline_rate = 0.10 # current conversion rate (10%) min_effect = 0.02 # minimum detectable effect (2%) alpha = 0.05 # significance level (5%) power = 0.8 # desired statistical power (80%) # Calculate z-scores for alpha and power z_alpha = norm.ppf(1 - alpha / 2) z_beta = norm.ppf(power) # Average conversion rate under null hypothesis p = baseline_rate d = min_effect # Sample size formula for two proportions (per group) n = 2 * ((z_alpha + z_beta)**2) * p * (1 - p) / d**2 # Round up to nearest whole number n = math.ceil(n) print(f"Required sample size per group: {n}")
copy
Note
Definition

Statistical power is the probability that an experiment will detect an effect when there is one to be found. In A/B testing, high statistical power means you are more likely to observe real differences between variants, reducing the risk of missing a true improvement or change.

question mark

What can happen if you run an A/B test with a sample size that is too small (i.e., the experiment is underpowered)?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 4
some-alt