Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Query Privacy: Counts, Sums, and Histograms | Differential Privacy Mechanisms
Data Privacy and Differential Privacy Fundamentals

bookQuery Privacy: Counts, Sums, and Histograms

When you apply differential privacy (DP) to real-world data analysis, you often want to release useful aggregate information—such as counts, sums, or histograms—while protecting individual privacy. These types of queries are common in data science and statistics, but each has its own privacy risks and must be handled carefully to ensure that no single person’s data can be inferred from the released results.

The core concept enabling DP for these queries is global sensitivity, which measures the maximum change in a query’s output when a single individual’s data is added or removed. Sensitivity directly determines how much noise you need to add to achieve a given privacy guarantee. For counts and sums, the global sensitivity is often straightforward: for counting queries (like “How many users live in New York?”), the sensitivity is typically 1, since one person can only change the count by at most one. For sum queries, the sensitivity depends on the possible range of each individual’s contribution.

Histograms are a powerful way to summarize how data is distributed across categories or bins. For example, a histogram might show how many people fall into different age ranges. In the context of DP, you must consider how much one person can change the histogram—this is where histogram sensitivity comes in. The Laplace mechanism is commonly used to add noise to each bin, making the released histogram differentially private. The amount of noise depends on both the sensitivity and your chosen privacy budget (epsilon).

123456789101112131415161718192021222324
import numpy as np def dp_histogram(counts, epsilon, sensitivity=1): """ Releases a differentially private histogram using Laplace noise. Args: counts (list or np.ndarray): The true counts for each histogram bin. epsilon (float): The privacy budget parameter. sensitivity (float): The global sensitivity for each bin (default 1). Returns: np.ndarray: The noisy counts for each bin. """ scale = sensitivity / epsilon noise = np.random.laplace(loc=0, scale=scale, size=len(counts)) dp_counts = np.array(counts) + noise return dp_counts # Example usage: true_counts = [30, 45, 25, 10] # Example histogram bins epsilon = 1.0 dp_counts = dp_histogram(true_counts, epsilon) print("Noisy histogram:", dp_counts)
copy
Note
Definition

Histogram sensitivity is the maximum amount by which a single individual can change the output of a histogram query. For a standard histogram where each person contributes to only one bin, the sensitivity is 1. This is because adding or removing one person affects only one bin’s count by at most one.

question mark

Which statements about differential privacy and histograms are correct?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 4

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain how the Laplace mechanism works in this context?

What does the epsilon parameter control in differential privacy?

How do I choose an appropriate value for sensitivity in my own data?

bookQuery Privacy: Counts, Sums, and Histograms

Свайпніть щоб показати меню

When you apply differential privacy (DP) to real-world data analysis, you often want to release useful aggregate information—such as counts, sums, or histograms—while protecting individual privacy. These types of queries are common in data science and statistics, but each has its own privacy risks and must be handled carefully to ensure that no single person’s data can be inferred from the released results.

The core concept enabling DP for these queries is global sensitivity, which measures the maximum change in a query’s output when a single individual’s data is added or removed. Sensitivity directly determines how much noise you need to add to achieve a given privacy guarantee. For counts and sums, the global sensitivity is often straightforward: for counting queries (like “How many users live in New York?”), the sensitivity is typically 1, since one person can only change the count by at most one. For sum queries, the sensitivity depends on the possible range of each individual’s contribution.

Histograms are a powerful way to summarize how data is distributed across categories or bins. For example, a histogram might show how many people fall into different age ranges. In the context of DP, you must consider how much one person can change the histogram—this is where histogram sensitivity comes in. The Laplace mechanism is commonly used to add noise to each bin, making the released histogram differentially private. The amount of noise depends on both the sensitivity and your chosen privacy budget (epsilon).

123456789101112131415161718192021222324
import numpy as np def dp_histogram(counts, epsilon, sensitivity=1): """ Releases a differentially private histogram using Laplace noise. Args: counts (list or np.ndarray): The true counts for each histogram bin. epsilon (float): The privacy budget parameter. sensitivity (float): The global sensitivity for each bin (default 1). Returns: np.ndarray: The noisy counts for each bin. """ scale = sensitivity / epsilon noise = np.random.laplace(loc=0, scale=scale, size=len(counts)) dp_counts = np.array(counts) + noise return dp_counts # Example usage: true_counts = [30, 45, 25, 10] # Example histogram bins epsilon = 1.0 dp_counts = dp_histogram(true_counts, epsilon) print("Noisy histogram:", dp_counts)
copy
Note
Definition

Histogram sensitivity is the maximum amount by which a single individual can change the output of a histogram query. For a standard histogram where each person contributes to only one bin, the sensitivity is 1. This is because adding or removing one person affects only one bin’s count by at most one.

question mark

Which statements about differential privacy and histograms are correct?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 4
some-alt