Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Randomization and Sample Size | Designing Effective A/B Tests
A/B Testing with Python

Randomization and Sample Size

Swipe to show menu

Random assignment is the foundation of trustworthy A/B testing. By assigning users to groups at random, you ensure that both groups are statistically similar, which reduces the risk of selection bias. Selection bias occurs when certain users are more likely to end up in one group over another, which can distort your results and lead to false conclusions. Without proper randomization, differences between groups may be caused by underlying characteristics rather than the variable you are testing.

Several randomization techniques are commonly used in A/B testing:

  • Simple randomization: each user has an equal chance of being assigned to any group, often using a random number generator;
  • Block randomization: users are grouped into blocks, and within each block, users are randomly assigned to different groups to maintain balance throughout the test;
  • Stratified randomization: users are divided into strata based on characteristics (such as age or location), and randomization occurs within each stratum to ensure all subgroups are represented.

Simple randomization is the most straightforward and is often sufficient for most digital experiments. However, more advanced techniques help maintain balance when dealing with smaller sample sizes or important subgroups.

1234567891011121314151617
import random # Simulating random assignment of 20 users to groups 'A' and 'B' users = [f"user_{i+1}" for i in range(20)] groups = ['A', 'B'] # Dictionary to hold group assignments assignment = {} for user in users: # Randomly choosing a group for each user assigned_group = random.choice(groups) assignment[user] = assigned_group # Printing out the assignment for user, group in assignment.items(): print(f"{user} assigned to group {group}")

Determining the right sample size is crucial for the reliability of your A/B test results. Too small a sample can lead to unreliable or inconclusive results, while an unnecessarily large sample wastes resources. The sample size directly affects the statistical power of your test - the probability of detecting a true difference if one exists.

Key concepts in sample size calculation include:

  • Minimum detectable effect (MDE): the smallest difference between groups that you want to be able to detect;
  • Significance level (alpha): the probability of a false positive (commonly set at 0.05);
  • Power (1 - beta): the probability of detecting a true effect (commonly set at 0.8 or 80%).

Power analysis combines these factors to estimate the minimum number of users needed in each group. Larger effect sizes or higher significance levels require fewer users, while smaller effects or higher power require more users.

12345678910111213141516171819202122
from scipy.stats import norm import math # Parameters for sample size calculation alpha = 0.05 # Significance level power = 0.8 # Desired power p1 = 0.10 # Baseline conversion rate (control group) p2 = 0.13 # Expected conversion rate (treatment group) effect_size = abs(p2 - p1) # Minimum detectable effect # Calculate pooled standard deviation pooled_prob = (p1 + p2) / 2 std_dev = math.sqrt(2 * pooled_prob * (1 - pooled_prob)) # Z-scores for alpha and power z_alpha = norm.ppf(1 - alpha / 2) z_beta = norm.ppf(power) # Sample size formula for two-proportion z-test n = ((z_alpha + z_beta) * std_dev / effect_size) ** 2 print(f"Required sample size per group: {math.ceil(n)}")
question mark

What is the main risk of poor randomization or using too small a sample size in A/B testing?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2
some-alt