Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Privacy Budget and Composition | Differential Privacy Mechanisms
Data Privacy and Differential Privacy Fundamentals

bookPrivacy Budget and Composition

Understanding how privacy loss accumulates is crucial when you use differential privacy in practice. Each time you make a query to a differentially private database, you consume a part of the overall privacy budget. The privacy budget is typically represented by the parameter εε, which quantifies the maximum allowable privacy loss. If you keep making queries, the total privacy loss adds up, and you must ensure you do not exceed your allotted budget.

Composition theorems help you understand and manage this process. The most basic is the sequential composition theorem: if you make multiple queries, each with its own privacy loss (ε1,ε2,...ε₁, ε₂, ...), the total privacy loss is simply the sum (ε1+ε2+... ε₁ + ε₂ + ...\ ). This means that privacy degrades linearly with each additional query. On the other hand, parallel composition allows you to make queries on disjoint subsets of the data. In this case, the overall privacy loss is determined by the largest single query, not the sum, because each subset's privacy loss does not overlap with the others.

In practice, this means you need to carefully allocate your privacy budget across all intended analyses. If you use up your privacy budget, you cannot safely make further queries without risking privacy guarantees. This makes planning and tracking your queries essential in any differentially private system.

Sequential composition
expand arrow

Suppose you are allowed a total privacy budget of ε=1.0ε = 1.0. If you make three queries, each using ε=0.3ε = 0.3, the total consumed is 0.90.9, leaving you with 0.10.1 to spend. If you try to make a fourth query with ε=0.3ε = 0.3, you would exceed your budget (total 1.21.2), breaking your privacy guarantee.

Parallel composition
expand arrow

Imagine you split your dataset into two disjoint parts: A and B. You run a query on A with ε=0.4ε = 0.4 and a different query on B with ε=0.6ε = 0.6. Because the data subsets do not overlap, your total privacy loss is max(0.4,0.6)=0.6max(0.4, 0.6) = 0.6, not 1.01.0. This lets you use your budget more efficiently when working with disjoint groups.

123456789101112131415161718
import numpy as np import matplotlib.pyplot as plt # Simulate privacy budget depletion over repeated queries total_budget = 1.0 epsilon_per_query = 0.2 max_queries = int(total_budget / epsilon_per_query) budgets = [total_budget - i * epsilon_per_query for i in range(max_queries + 2)] queries = list(range(len(budgets))) plt.step(queries, budgets, where='post') plt.xlabel("Number of Queries") plt.ylabel("Remaining Privacy Budget (ε)") plt.title("Privacy Budget Depletion Over Repeated Queries") plt.ylim(0, total_budget + 0.1) plt.grid(True) plt.show()
copy

1. Which statement best describes the management of a privacy budget when using differential privacy?

2. What is the main risk of exceeding your privacy budget in a differentially private system?

question mark

Which statement best describes the management of a privacy budget when using differential privacy?

Select the correct answer

question mark

What is the main risk of exceeding your privacy budget in a differentially private system?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 5

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

bookPrivacy Budget and Composition

Stryg for at vise menuen

Understanding how privacy loss accumulates is crucial when you use differential privacy in practice. Each time you make a query to a differentially private database, you consume a part of the overall privacy budget. The privacy budget is typically represented by the parameter εε, which quantifies the maximum allowable privacy loss. If you keep making queries, the total privacy loss adds up, and you must ensure you do not exceed your allotted budget.

Composition theorems help you understand and manage this process. The most basic is the sequential composition theorem: if you make multiple queries, each with its own privacy loss (ε1,ε2,...ε₁, ε₂, ...), the total privacy loss is simply the sum (ε1+ε2+... ε₁ + ε₂ + ...\ ). This means that privacy degrades linearly with each additional query. On the other hand, parallel composition allows you to make queries on disjoint subsets of the data. In this case, the overall privacy loss is determined by the largest single query, not the sum, because each subset's privacy loss does not overlap with the others.

In practice, this means you need to carefully allocate your privacy budget across all intended analyses. If you use up your privacy budget, you cannot safely make further queries without risking privacy guarantees. This makes planning and tracking your queries essential in any differentially private system.

Sequential composition
expand arrow

Suppose you are allowed a total privacy budget of ε=1.0ε = 1.0. If you make three queries, each using ε=0.3ε = 0.3, the total consumed is 0.90.9, leaving you with 0.10.1 to spend. If you try to make a fourth query with ε=0.3ε = 0.3, you would exceed your budget (total 1.21.2), breaking your privacy guarantee.

Parallel composition
expand arrow

Imagine you split your dataset into two disjoint parts: A and B. You run a query on A with ε=0.4ε = 0.4 and a different query on B with ε=0.6ε = 0.6. Because the data subsets do not overlap, your total privacy loss is max(0.4,0.6)=0.6max(0.4, 0.6) = 0.6, not 1.01.0. This lets you use your budget more efficiently when working with disjoint groups.

123456789101112131415161718
import numpy as np import matplotlib.pyplot as plt # Simulate privacy budget depletion over repeated queries total_budget = 1.0 epsilon_per_query = 0.2 max_queries = int(total_budget / epsilon_per_query) budgets = [total_budget - i * epsilon_per_query for i in range(max_queries + 2)] queries = list(range(len(budgets))) plt.step(queries, budgets, where='post') plt.xlabel("Number of Queries") plt.ylabel("Remaining Privacy Budget (ε)") plt.title("Privacy Budget Depletion Over Repeated Queries") plt.ylim(0, total_budget + 0.1) plt.grid(True) plt.show()
copy

1. Which statement best describes the management of a privacy budget when using differential privacy?

2. What is the main risk of exceeding your privacy budget in a differentially private system?

question mark

Which statement best describes the management of a privacy budget when using differential privacy?

Select the correct answer

question mark

What is the main risk of exceeding your privacy budget in a differentially private system?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 5
some-alt