Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Query Privacy: Counts, Sums, and Histograms | Differential Privacy Mechanisms
Data Privacy and Differential Privacy Fundamentals

bookQuery Privacy: Counts, Sums, and Histograms

When you apply differential privacy (DP) to real-world data analysis, you often want to release useful aggregate information—such as counts, sums, or histograms—while protecting individual privacy. These types of queries are common in data science and statistics, but each has its own privacy risks and must be handled carefully to ensure that no single person’s data can be inferred from the released results.

The core concept enabling DP for these queries is global sensitivity, which measures the maximum change in a query’s output when a single individual’s data is added or removed. Sensitivity directly determines how much noise you need to add to achieve a given privacy guarantee. For counts and sums, the global sensitivity is often straightforward: for counting queries (like “How many users live in New York?”), the sensitivity is typically 1, since one person can only change the count by at most one. For sum queries, the sensitivity depends on the possible range of each individual’s contribution.

Histograms are a powerful way to summarize how data is distributed across categories or bins. For example, a histogram might show how many people fall into different age ranges. In the context of DP, you must consider how much one person can change the histogram—this is where histogram sensitivity comes in. The Laplace mechanism is commonly used to add noise to each bin, making the released histogram differentially private. The amount of noise depends on both the sensitivity and your chosen privacy budget (epsilon).

123456789101112131415161718192021222324
import numpy as np def dp_histogram(counts, epsilon, sensitivity=1): """ Releases a differentially private histogram using Laplace noise. Args: counts (list or np.ndarray): The true counts for each histogram bin. epsilon (float): The privacy budget parameter. sensitivity (float): The global sensitivity for each bin (default 1). Returns: np.ndarray: The noisy counts for each bin. """ scale = sensitivity / epsilon noise = np.random.laplace(loc=0, scale=scale, size=len(counts)) dp_counts = np.array(counts) + noise return dp_counts # Example usage: true_counts = [30, 45, 25, 10] # Example histogram bins epsilon = 1.0 dp_counts = dp_histogram(true_counts, epsilon) print("Noisy histogram:", dp_counts)
copy
Note
Definition

Histogram sensitivity is the maximum amount by which a single individual can change the output of a histogram query. For a standard histogram where each person contributes to only one bin, the sensitivity is 1. This is because adding or removing one person affects only one bin’s count by at most one.

question mark

Which statements about differential privacy and histograms are correct?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 4

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

bookQuery Privacy: Counts, Sums, and Histograms

Pyyhkäise näyttääksesi valikon

When you apply differential privacy (DP) to real-world data analysis, you often want to release useful aggregate information—such as counts, sums, or histograms—while protecting individual privacy. These types of queries are common in data science and statistics, but each has its own privacy risks and must be handled carefully to ensure that no single person’s data can be inferred from the released results.

The core concept enabling DP for these queries is global sensitivity, which measures the maximum change in a query’s output when a single individual’s data is added or removed. Sensitivity directly determines how much noise you need to add to achieve a given privacy guarantee. For counts and sums, the global sensitivity is often straightforward: for counting queries (like “How many users live in New York?”), the sensitivity is typically 1, since one person can only change the count by at most one. For sum queries, the sensitivity depends on the possible range of each individual’s contribution.

Histograms are a powerful way to summarize how data is distributed across categories or bins. For example, a histogram might show how many people fall into different age ranges. In the context of DP, you must consider how much one person can change the histogram—this is where histogram sensitivity comes in. The Laplace mechanism is commonly used to add noise to each bin, making the released histogram differentially private. The amount of noise depends on both the sensitivity and your chosen privacy budget (epsilon).

123456789101112131415161718192021222324
import numpy as np def dp_histogram(counts, epsilon, sensitivity=1): """ Releases a differentially private histogram using Laplace noise. Args: counts (list or np.ndarray): The true counts for each histogram bin. epsilon (float): The privacy budget parameter. sensitivity (float): The global sensitivity for each bin (default 1). Returns: np.ndarray: The noisy counts for each bin. """ scale = sensitivity / epsilon noise = np.random.laplace(loc=0, scale=scale, size=len(counts)) dp_counts = np.array(counts) + noise return dp_counts # Example usage: true_counts = [30, 45, 25, 10] # Example histogram bins epsilon = 1.0 dp_counts = dp_histogram(true_counts, epsilon) print("Noisy histogram:", dp_counts)
copy
Note
Definition

Histogram sensitivity is the maximum amount by which a single individual can change the output of a histogram query. For a standard histogram where each person contributes to only one bin, the sensitivity is 1. This is because adding or removing one person affects only one bin’s count by at most one.

question mark

Which statements about differential privacy and histograms are correct?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 4
some-alt