Differentially Private Stochastic Gradient Descent (DP-SGD)
Differentially Private Stochastic Gradient Descent, or DP-SGD, is a powerful approach for training machine learning models while providing formal privacy guarantees for the data used. DP-SGD modifies the standard stochastic gradient descent (SGD) algorithm by introducing two key steps: gradient clipping and noise addition. These steps ensure that the contribution of any single data point to the model update is limited, and the overall training process is differentially private.
Differential privacy is formally defined as follows: a randomized algorithm A is (ϵ,δ)-differentially private if, for all datasets D and D′ differing on a single entry and for all measurable sets S,
Pr[A(D)∈S]≤eϵPr[A(D′)∈S]+δIn DP-SGD, during each minibatch update, the gradient computed for each individual data sample gi is first clipped to a fixed norm C. This gradient clipping step ensures that no single sample can have a disproportionate influence on the model update, which is crucial for bounding the sensitivity of the training process. The clipped gradient for sample i is:
g~i=gi⋅min(1,∥gi∥2C)After clipping, the gradients are averaged:
gˉ=L1i=1∑Lg~iwhere L is the minibatch size. Then, Gaussian noise is added to the averaged gradient to mask the presence or absence of any single data point. The noisy gradient is:
g^=gˉ+N(0,σ2C2I)Here, σ is the noise multiplier, and N(0,σ2C2I) denotes Gaussian noise with mean zero and standard deviation σC per coordinate. This mechanism achieves (ϵ,δ)-differential privacy guarantees for the training process.
To keep track of the cumulative privacy loss over multiple training steps, DP-SGD uses a technique called the moments accountant.
Moments accountant
The moments accountant is a mathematical tool used in DP-SGD to track and tightly bound the cumulative privacy loss (privacy budget) over many training steps. It allows for more accurate accounting of privacy guarantees than naïve composition, especially when training for many epochs.
12345678910111213141516171819202122232425262728293031import numpy as np # Simulate a batch of per-example gradients (5 samples, 3 parameters each) per_example_grads = np.array([ [0.5, 2.0, 1.5], [1.2, -0.7, 0.3], [2.1, 0.0, -1.0], [-1.5, 0.9, 0.8], [0.4, -2.2, 0.5] ]) clip_norm = 1.0 noise_stddev = 0.5 # Step 1: Gradient clipping def clip_gradients(grads, clip_norm): norms = np.linalg.norm(grads, axis=1, keepdims=True) scaling_factors = np.minimum(1.0, clip_norm / (norms + 1e-6)) return grads * scaling_factors clipped_grads = clip_gradients(per_example_grads, clip_norm) # Step 2: Average the clipped gradients avg_clipped_grad = np.mean(clipped_grads, axis=0) # Step 3: Add Gaussian noise noisy_grad = avg_clipped_grad + np.random.normal(0, noise_stddev, size=avg_clipped_grad.shape) print("Clipped gradients:\n", clipped_grads) print("Averaged clipped gradient:", avg_clipped_grad) print("Noisy (DP) gradient:", noisy_grad)
1. What is the main purpose of gradient clipping in DP-SGD?
2. Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Fantastisk!
Completion rate forbedret til 7.14
Differentially Private Stochastic Gradient Descent (DP-SGD)
Stryg for at vise menuen
Differentially Private Stochastic Gradient Descent, or DP-SGD, is a powerful approach for training machine learning models while providing formal privacy guarantees for the data used. DP-SGD modifies the standard stochastic gradient descent (SGD) algorithm by introducing two key steps: gradient clipping and noise addition. These steps ensure that the contribution of any single data point to the model update is limited, and the overall training process is differentially private.
Differential privacy is formally defined as follows: a randomized algorithm A is (ϵ,δ)-differentially private if, for all datasets D and D′ differing on a single entry and for all measurable sets S,
Pr[A(D)∈S]≤eϵPr[A(D′)∈S]+δIn DP-SGD, during each minibatch update, the gradient computed for each individual data sample gi is first clipped to a fixed norm C. This gradient clipping step ensures that no single sample can have a disproportionate influence on the model update, which is crucial for bounding the sensitivity of the training process. The clipped gradient for sample i is:
g~i=gi⋅min(1,∥gi∥2C)After clipping, the gradients are averaged:
gˉ=L1i=1∑Lg~iwhere L is the minibatch size. Then, Gaussian noise is added to the averaged gradient to mask the presence or absence of any single data point. The noisy gradient is:
g^=gˉ+N(0,σ2C2I)Here, σ is the noise multiplier, and N(0,σ2C2I) denotes Gaussian noise with mean zero and standard deviation σC per coordinate. This mechanism achieves (ϵ,δ)-differential privacy guarantees for the training process.
To keep track of the cumulative privacy loss over multiple training steps, DP-SGD uses a technique called the moments accountant.
Moments accountant
The moments accountant is a mathematical tool used in DP-SGD to track and tightly bound the cumulative privacy loss (privacy budget) over many training steps. It allows for more accurate accounting of privacy guarantees than naïve composition, especially when training for many epochs.
12345678910111213141516171819202122232425262728293031import numpy as np # Simulate a batch of per-example gradients (5 samples, 3 parameters each) per_example_grads = np.array([ [0.5, 2.0, 1.5], [1.2, -0.7, 0.3], [2.1, 0.0, -1.0], [-1.5, 0.9, 0.8], [0.4, -2.2, 0.5] ]) clip_norm = 1.0 noise_stddev = 0.5 # Step 1: Gradient clipping def clip_gradients(grads, clip_norm): norms = np.linalg.norm(grads, axis=1, keepdims=True) scaling_factors = np.minimum(1.0, clip_norm / (norms + 1e-6)) return grads * scaling_factors clipped_grads = clip_gradients(per_example_grads, clip_norm) # Step 2: Average the clipped gradients avg_clipped_grad = np.mean(clipped_grads, axis=0) # Step 3: Add Gaussian noise noisy_grad = avg_clipped_grad + np.random.normal(0, noise_stddev, size=avg_clipped_grad.shape) print("Clipped gradients:\n", clipped_grads) print("Averaged clipped gradient:", avg_clipped_grad) print("Noisy (DP) gradient:", noisy_grad)
1. What is the main purpose of gradient clipping in DP-SGD?
2. Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?
Tak for dine kommentarer!