Lære Differentially Private Stochastic Gradient Descent (DP-SGD) | Differential Privacy in Machine Learning & Real Systems

Stryg for at vise menuen

Differentially Private Stochastic Gradient Descent, or DP-SGD, is a powerful approach for training machine learning models while providing formal privacy guarantees for the data used. DP-SGD modifies the standard stochastic gradient descent (SGD) algorithm by introducing two key steps: gradient clipping and noise addition. These steps ensure that the contribution of any single data point to the model update is limited, and the overall training process is differentially private.

Differential privacy is formally defined as follows: a randomized algorithm $\mathcal{A}$ is $(\epsilon, \delta)$ -differentially private if, for all datasets $D$ and $D'$ differing on a single entry and for all measurable sets $S$ ,

\Pr[\mathcal{A}(D) \in S] \leq e^{\epsilon} \Pr[\mathcal{A}(D') \in S] + \delta

In DP-SGD, during each minibatch update, the gradient computed for each individual data sample $g_i$ is first clipped to a fixed norm $C$ . This gradient clipping step ensures that no single sample can have a disproportionate influence on the model update, which is crucial for bounding the sensitivity of the training process. The clipped gradient for sample $i$ is:

\tilde{g}_i = g_i \cdot \min\left(1, \frac{C}{\|g_i\|_2}\right)

After clipping, the gradients are averaged:

\bar{g} = \frac{1}{L} \sum_{i=1}^L \tilde{g}_i

where $L$ is the minibatch size. Then, Gaussian noise is added to the averaged gradient to mask the presence or absence of any single data point. The noisy gradient is:

\hat{g} = \bar{g} + \mathcal{N}(0, \sigma^2 C^2 I)

Here, $\sigma$ is the noise multiplier, and $\mathcal{N}(0, \sigma^2 C^2 I)$ denotes Gaussian noise with mean zero and standard deviation $\sigma C$ per coordinate. This mechanism achieves $(\epsilon, \delta)$ -differential privacy guarantees for the training process.

To keep track of the cumulative privacy loss over multiple training steps, DP-SGD uses a technique called the moments accountant.

Definition

Moments accountant
The moments accountant is a mathematical tool used in DP-SGD to track and tightly bound the cumulative privacy loss (privacy budget) over many training steps. It allows for more accurate accounting of privacy guarantees than naïve composition, especially when training for many epochs.


              12345678910111213141516171819202122232425262728293031
            
import numpy as np

# Simulate a batch of per-example gradients (5 samples, 3 parameters each)
per_example_grads = np.array([
    [0.5, 2.0, 1.5],
    [1.2, -0.7, 0.3],
    [2.1, 0.0, -1.0],
    [-1.5, 0.9, 0.8],
    [0.4, -2.2, 0.5]
])

clip_norm = 1.0
noise_stddev = 0.5

# Step 1: Gradient clipping
def clip_gradients(grads, clip_norm):
    norms = np.linalg.norm(grads, axis=1, keepdims=True)
    scaling_factors = np.minimum(1.0, clip_norm / (norms + 1e-6))
    return grads * scaling_factors

clipped_grads = clip_gradients(per_example_grads, clip_norm)

# Step 2: Average the clipped gradients
avg_clipped_grad = np.mean(clipped_grads, axis=0)

# Step 3: Add Gaussian noise
noisy_grad = avg_clipped_grad + np.random.normal(0, noise_stddev, size=avg_clipped_grad.shape)

print("Clipped gradients:\n", clipped_grads)
print("Averaged clipped gradient:", avg_clipped_grad)
print("Noisy (DP) gradient:", noisy_grad)

1. What is the main purpose of gradient clipping in DP-SGD?

2. Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?

Var alt klart?

Tak for dine kommentarer!

Sektion 3. Kapitel 1

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 3. Kapitel 1