Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Randomized Response | Differential Privacy in Machine Learning & Real Systems
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Data Privacy and Differential Privacy Fundamentals

bookRandomized Response

Randomized response is a foundational technique for achieving local differential privacy in survey data collection. When you use randomized response, each participant perturbs their answer to a sensitive binary question (such as "Have you ever committed tax fraud?") according to a probabilistic protocol, so that even the data collector cannot be certain of the true answer from any individual. This protocol allows you to estimate population-level statistics with strong privacy guarantees for every respondent.

The basic randomized response protocol for a binary question works as follows: each respondent flips a coin (or generates a random bit). With probability pp, they report their true answer. With probability 1βˆ’p1 - p, they report a random answer, chosen with equal probability between "Yes" and "No". This way, even if a respondent answers "Yes", you cannot be sure whether it reflects their real answer or was randomly chosen. The protocol ensures that each individual's privacy is protected, while the aggregate responses can still be used to accurately estimate the true proportion of "Yes" answers in the population, once you account for the noise introduced by the protocol.

12345678910111213141516171819202122
# Randomized response for a binary ("Yes"/"No") question import random def randomized_response(true_answer: bool, p: float = 0.7) -> bool: """ Simulates the randomized response protocol for a binary question. Args: true_answer (bool): The respondent's actual answer (True for "Yes", False for "No"). p (float): Probability to report the true answer (0 < p < 1). Returns: bool: The (possibly randomized) reported answer. """ if random.random() < p: return true_answer else: return random.choice([True, False]) # Example: simulate 10 responses from a respondent whose true answer is "Yes" responses = [randomized_response(True, p=0.7) for _ in range(10)] print("Simulated responses:", responses)
copy
Note
Study More

The mathematical privacy guarantee of randomized response can be analyzed using the concept of epsilon-local differential privacy. By carefully choosing the probability pp, you can control the privacy parameter epsilon, which quantifies how much the reported answer reveals about the true answer. For a deeper treatment, see Dwork & Roth's "The Algorithmic Foundations of Differential Privacy" (2014), Section 2.4.

1. Which of the following best describes how randomized response protects individual privacy in a survey?

2. If you collect many randomized responses to a binary question using the randomized response protocol, what must you do to accurately estimate the true proportion of "Yes" answers in the population?

question mark

Which of the following best describes how randomized response protects individual privacy in a survey?

Select the correct answer

question mark

If you collect many randomized responses to a binary question using the randomized response protocol, what must you do to accurately estimate the true proportion of "Yes" answers in the population?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how to estimate the true proportion of "Yes" answers from the randomized responses?

What are the privacy guarantees provided by this protocol?

How does changing the value of p affect privacy and accuracy?

bookRandomized Response

Swipe to show menu

Randomized response is a foundational technique for achieving local differential privacy in survey data collection. When you use randomized response, each participant perturbs their answer to a sensitive binary question (such as "Have you ever committed tax fraud?") according to a probabilistic protocol, so that even the data collector cannot be certain of the true answer from any individual. This protocol allows you to estimate population-level statistics with strong privacy guarantees for every respondent.

The basic randomized response protocol for a binary question works as follows: each respondent flips a coin (or generates a random bit). With probability pp, they report their true answer. With probability 1βˆ’p1 - p, they report a random answer, chosen with equal probability between "Yes" and "No". This way, even if a respondent answers "Yes", you cannot be sure whether it reflects their real answer or was randomly chosen. The protocol ensures that each individual's privacy is protected, while the aggregate responses can still be used to accurately estimate the true proportion of "Yes" answers in the population, once you account for the noise introduced by the protocol.

12345678910111213141516171819202122
# Randomized response for a binary ("Yes"/"No") question import random def randomized_response(true_answer: bool, p: float = 0.7) -> bool: """ Simulates the randomized response protocol for a binary question. Args: true_answer (bool): The respondent's actual answer (True for "Yes", False for "No"). p (float): Probability to report the true answer (0 < p < 1). Returns: bool: The (possibly randomized) reported answer. """ if random.random() < p: return true_answer else: return random.choice([True, False]) # Example: simulate 10 responses from a respondent whose true answer is "Yes" responses = [randomized_response(True, p=0.7) for _ in range(10)] print("Simulated responses:", responses)
copy
Note
Study More

The mathematical privacy guarantee of randomized response can be analyzed using the concept of epsilon-local differential privacy. By carefully choosing the probability pp, you can control the privacy parameter epsilon, which quantifies how much the reported answer reveals about the true answer. For a deeper treatment, see Dwork & Roth's "The Algorithmic Foundations of Differential Privacy" (2014), Section 2.4.

1. Which of the following best describes how randomized response protects individual privacy in a survey?

2. If you collect many randomized responses to a binary question using the randomized response protocol, what must you do to accurately estimate the true proportion of "Yes" answers in the population?

question mark

Which of the following best describes how randomized response protects individual privacy in a survey?

Select the correct answer

question mark

If you collect many randomized responses to a binary question using the randomized response protocol, what must you do to accurately estimate the true proportion of "Yes" answers in the population?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3
some-alt