Learn Randomized Response | Differential Privacy in Machine Learning & Real Systems

Swipe to show menu

Randomized response is a foundational technique for achieving local differential privacy in survey data collection. When you use randomized response, each participant perturbs their answer to a sensitive binary question (such as "Have you ever committed tax fraud?") according to a probabilistic protocol, so that even the data collector cannot be certain of the true answer from any individual. This protocol allows you to estimate population-level statistics with strong privacy guarantees for every respondent.

The basic randomized response protocol for a binary question works as follows: each respondent flips a coin (or generates a random bit). With probability $p$ , they report their true answer. With probability $1 - p$ , they report a random answer, chosen with equal probability between "Yes" and "No". This way, even if a respondent answers "Yes", you cannot be sure whether it reflects their real answer or was randomly chosen. The protocol ensures that each individual's privacy is protected, while the aggregate responses can still be used to accurately estimate the true proportion of "Yes" answers in the population, once you account for the noise introduced by the protocol.


              12345678910111213141516171819202122
            
# Randomized response for a binary ("Yes"/"No") question

import random

def randomized_response(true_answer: bool, p: float = 0.7) -> bool:
    """
    Simulates the randomized response protocol for a binary question.
    Args:
        true_answer (bool): The respondent's actual answer (True for "Yes", False for "No").
        p (float): Probability to report the true answer (0 < p < 1).
    Returns:
        bool: The (possibly randomized) reported answer.
    """
    if random.random() < p:
        return true_answer
    else:
        return random.choice([True, False])

# Example: simulate 10 responses from a respondent whose true answer is "Yes"
responses = [randomized_response(True, p=0.7) for _ in range(10)]
print("Simulated responses:", responses)

Study More

The mathematical privacy guarantee of randomized response can be analyzed using the concept of epsilon-local differential privacy. By carefully choosing the probability $p$ , you can control the privacy parameter epsilon, which quantifies how much the reported answer reveals about the true answer. For a deeper treatment, see Dwork & Roth's "The Algorithmic Foundations of Differential Privacy" (2014), Section 2.4.

1. Which of the following best describes how randomized response protects individual privacy in a survey?

2. If you collect many randomized responses to a binary question using the randomized response protocol, what must you do to accurately estimate the true proportion of "Yes" answers in the population?

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 3