Вивчайте Query Privacy: Counts, Sums, and Histograms

Свайпніть щоб показати меню

When you apply differential privacy (DP) to real-world data analysis, you often want to release useful aggregate information—such as counts, sums, or histograms—while protecting individual privacy. These types of queries are common in data science and statistics, but each has its own privacy risks and must be handled carefully to ensure that no single person’s data can be inferred from the released results.

The core concept enabling DP for these queries is global sensitivity, which measures the maximum change in a query’s output when a single individual’s data is added or removed. Sensitivity directly determines how much noise you need to add to achieve a given privacy guarantee. For counts and sums, the global sensitivity is often straightforward: for counting queries (like “How many users live in New York?”), the sensitivity is typically 1, since one person can only change the count by at most one. For sum queries, the sensitivity depends on the possible range of each individual’s contribution.

Histograms are a powerful way to summarize how data is distributed across categories or bins. For example, a histogram might show how many people fall into different age ranges. In the context of DP, you must consider how much one person can change the histogram—this is where histogram sensitivity comes in. The Laplace mechanism is commonly used to add noise to each bin, making the released histogram differentially private. The amount of noise depends on both the sensitivity and your chosen privacy budget (epsilon).


              123456789101112131415161718192021222324
            
import numpy as np

def dp_histogram(counts, epsilon, sensitivity=1):
    """
    Releases a differentially private histogram using Laplace noise.
    
    Args:
        counts (list or np.ndarray): The true counts for each histogram bin.
        epsilon (float): The privacy budget parameter.
        sensitivity (float): The global sensitivity for each bin (default 1).
        
    Returns:
        np.ndarray: The noisy counts for each bin.
    """
    scale = sensitivity / epsilon
    noise = np.random.laplace(loc=0, scale=scale, size=len(counts))
    dp_counts = np.array(counts) + noise
    return dp_counts

# Example usage:
true_counts = [30, 45, 25, 10]  # Example histogram bins
epsilon = 1.0
dp_counts = dp_histogram(true_counts, epsilon)
print("Noisy histogram:", dp_counts)

Definition

Histogram sensitivity is the maximum amount by which a single individual can change the output of a histogram query. For a standard histogram where each person contributes to only one bin, the sensitivity is 1. This is because adding or removing one person affects only one bin’s count by at most one.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 4

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 2. Розділ 4