Aprende Privacy Budget and Composition | Differential Privacy Mechanisms

Desliza para mostrar el menú

Understanding how privacy loss accumulates is crucial when you use differential privacy in practice. Each time you make a query to a differentially private database, you consume a part of the overall privacy budget. The privacy budget is typically represented by the parameter $ε$ , which quantifies the maximum allowable privacy loss. If you keep making queries, the total privacy loss adds up, and you must ensure you do not exceed your allotted budget.

Composition theorems help you understand and manage this process. The most basic is the sequential composition theorem: if you make multiple queries, each with its own privacy loss ( $ε₁, ε₂, ...$ ), the total privacy loss is simply the sum ( $ε₁ + ε₂ + ...\$ ). This means that privacy degrades linearly with each additional query. On the other hand, parallel composition allows you to make queries on disjoint subsets of the data. In this case, the overall privacy loss is determined by the largest single query, not the sum, because each subset's privacy loss does not overlap with the others.

In practice, this means you need to carefully allocate your privacy budget across all intended analyses. If you use up your privacy budget, you cannot safely make further queries without risking privacy guarantees. This makes planning and tracking your queries essential in any differentially private system.

Sequential composition

Suppose you are allowed a total privacy budget of $ε = 1.0$ . If you make three queries, each using $ε = 0.3$ , the total consumed is $0.9$ , leaving you with $0.1$ to spend. If you try to make a fourth query with $ε = 0.3$ , you would exceed your budget (total $1.2$ ), breaking your privacy guarantee.

Parallel composition

Imagine you split your dataset into two disjoint parts: A and B. You run a query on A with $ε = 0.4$ and a different query on B with $ε = 0.6$ . Because the data subsets do not overlap, your total privacy loss is $max(0.4, 0.6) = 0.6$ , not $1.0$ . This lets you use your budget more efficiently when working with disjoint groups.


              123456789101112131415161718
            
import numpy as np
import matplotlib.pyplot as plt

# Simulate privacy budget depletion over repeated queries
total_budget = 1.0
epsilon_per_query = 0.2
max_queries = int(total_budget / epsilon_per_query)

budgets = [total_budget - i * epsilon_per_query for i in range(max_queries + 2)]
queries = list(range(len(budgets)))

plt.step(queries, budgets, where='post')
plt.xlabel("Number of Queries")
plt.ylabel("Remaining Privacy Budget (ε)")
plt.title("Privacy Budget Depletion Over Repeated Queries")
plt.ylim(0, total_budget + 0.1)
plt.grid(True)
plt.show()

1. Which statement best describes the management of a privacy budget when using differential privacy?

2. What is the main risk of exceeding your privacy budget in a differentially private system?

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 5

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 2. Capítulo 5