Learn Randomization and Sample Size | Designing Effective A/B Tests

Swipe to show menu

Random assignment is the foundation of trustworthy A/B testing. By assigning users to groups at random, you ensure that both groups are statistically similar, which reduces the risk of selection bias. Selection bias occurs when certain users are more likely to end up in one group over another, which can distort your results and lead to false conclusions. Without proper randomization, differences between groups may be caused by underlying characteristics rather than the variable you are testing.

Several randomization techniques are commonly used in A/B testing:

Simple randomization: each user has an equal chance of being assigned to any group, often using a random number generator;
Block randomization: users are grouped into blocks, and within each block, users are randomly assigned to different groups to maintain balance throughout the test;
Stratified randomization: users are divided into strata based on characteristics (such as age or location), and randomization occurs within each stratum to ensure all subgroups are represented.

Simple randomization is the most straightforward and is often sufficient for most digital experiments. However, more advanced techniques help maintain balance when dealing with smaller sample sizes or important subgroups.


              1234567891011121314151617
            
import random

# Simulating random assignment of 20 users to groups 'A' and 'B'
users = [f"user_{i+1}" for i in range(20)]
groups = ['A', 'B']

# Dictionary to hold group assignments
assignment = {}

for user in users:
    # Randomly choosing a group for each user
    assigned_group = random.choice(groups)
    assignment[user] = assigned_group

# Printing out the assignment
for user, group in assignment.items():
    print(f"{user} assigned to group {group}")

Determining the right sample size is crucial for the reliability of your A/B test results. Too small a sample can lead to unreliable or inconclusive results, while an unnecessarily large sample wastes resources. The sample size directly affects the statistical power of your test - the probability of detecting a true difference if one exists.

Key concepts in sample size calculation include:

Minimum detectable effect (MDE): the smallest difference between groups that you want to be able to detect;
Significance level (alpha): the probability of a false positive (commonly set at 0.05);
Power (1 - beta): the probability of detecting a true effect (commonly set at 0.8 or 80%).

Power analysis combines these factors to estimate the minimum number of users needed in each group. Larger effect sizes or higher significance levels require fewer users, while smaller effects or higher power require more users.


              12345678910111213141516171819202122
            
from scipy.stats import norm
import math

# Parameters for sample size calculation
alpha = 0.05  # Significance level
power = 0.8   # Desired power
p1 = 0.10     # Baseline conversion rate (control group)
p2 = 0.13     # Expected conversion rate (treatment group)
effect_size = abs(p2 - p1)  # Minimum detectable effect

# Calculate pooled standard deviation
pooled_prob = (p1 + p2) / 2
std_dev = math.sqrt(2 * pooled_prob * (1 - pooled_prob))

# Z-scores for alpha and power
z_alpha = norm.ppf(1 - alpha / 2)
z_beta = norm.ppf(power)

# Sample size formula for two-proportion z-test
n = ((z_alpha + z_beta) * std_dev / effect_size) ** 2

print(f"Required sample size per group: {math.ceil(n)}")

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2