Learn Releasing Aggregate DP Statistics | Differential Privacy in Machine Learning & Real Systems

Swipe to show menu

When releasing aggregate statistics from sensitive datasets, such as counts or means, you must take care to protect individual privacy. Differential privacy (DP) provides a mathematically rigorous way to do this by adding carefully calibrated noise to the statistics before releasing them. The most common mechanisms for this are the Laplace and Gaussian mechanisms, which ensure that the released statistic does not reveal whether any single individual's data was present in the dataset.

Best practices for releasing DP aggregates include selecting the smallest reasonable value for the privacy parameter epsilon ( $ε$ ) that still allows your statistic to be useful. Lower epsilon values provide stronger privacy guarantees, but also introduce more noise, which can reduce the accuracy of your results. You should always report the level of noise added and the corresponding accuracy or confidence interval of your released statistic, so users can interpret the results appropriately.

When releasing counts, means, or sums, you should compute the global sensitivity of your query—this is the maximum amount that the aggregate could change if any one individual's data were added or removed. For example, in the case of a mean, the sensitivity depends on the range of possible values in your dataset. Once you know the sensitivity and your chosen epsilon, you can add Laplace noise scaled accordingly. Always document your assumptions, such as the value range and the total number of queries, as these affect both privacy and utility.

By following these best practices, you can release meaningful aggregate statistics while protecting the privacy of individuals in your dataset.


              123456789101112131415161718192021222324
            
import numpy as np

def dp_mean(values, epsilon, value_range):
    """
    Releases a differentially private mean using the Laplace mechanism.
    
    Args:
        values (list of float): The data values.
        epsilon (float): Privacy budget parameter.
        value_range (tuple): (min_value, max_value) for sensitivity calculation.
    
    Returns:
        float: The noisy mean.
    """
    n = len(values)
    if n == 0:
        raise ValueError("Input list must not be empty.")
    min_value, max_value = value_range
    sensitivity = (max_value - min_value) / n
    true_mean = np.mean(values)
    scale = sensitivity / epsilon
    noise = np.random.laplace(loc=0, scale=scale)
    return true_mean + noise

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 4