Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Differentially Private Stochastic Gradient Descent (DP-SGD) | Differential Privacy in Machine Learning & Real Systems
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Data Privacy and Differential Privacy Fundamentals

bookDifferentially Private Stochastic Gradient Descent (DP-SGD)

Differentially Private Stochastic Gradient Descent, or DP-SGD, is a powerful approach for training machine learning models while providing formal privacy guarantees for the data used. DP-SGD modifies the standard stochastic gradient descent (SGD) algorithm by introducing two key steps: gradient clipping and noise addition. These steps ensure that the contribution of any single data point to the model update is limited, and the overall training process is differentially private.

Differential privacy is formally defined as follows: a randomized algorithm A \mathcal{A} is (ϵ,δ)(\epsilon, \delta)-differentially private if, for all datasets DD and DD' differing on a single entry and for all measurable sets SS,

Pr[A(D)S]eϵPr[A(D)S]+δ\Pr[\mathcal{A}(D) \in S] \leq e^{\epsilon} \Pr[\mathcal{A}(D') \in S] + \delta

In DP-SGD, during each minibatch update, the gradient computed for each individual data sample gig_i is first clipped to a fixed norm CC. This gradient clipping step ensures that no single sample can have a disproportionate influence on the model update, which is crucial for bounding the sensitivity of the training process. The clipped gradient for sample ii is:

g~i=gimin(1,Cgi2)\tilde{g}_i = g_i \cdot \min\left(1, \frac{C}{\|g_i\|_2}\right)

After clipping, the gradients are averaged:

gˉ=1Li=1Lg~i\bar{g} = \frac{1}{L} \sum_{i=1}^L \tilde{g}_i

where LL is the minibatch size. Then, Gaussian noise is added to the averaged gradient to mask the presence or absence of any single data point. The noisy gradient is:

g^=gˉ+N(0,σ2C2I)\hat{g} = \bar{g} + \mathcal{N}(0, \sigma^2 C^2 I)

Here, σ\sigma is the noise multiplier, and N(0,σ2C2I)\mathcal{N}(0, \sigma^2 C^2 I) denotes Gaussian noise with mean zero and standard deviation σC\sigma C per coordinate. This mechanism achieves (ϵ,δ)(\epsilon, \delta)-differential privacy guarantees for the training process.

To keep track of the cumulative privacy loss over multiple training steps, DP-SGD uses a technique called the moments accountant.

Note
Definition

Moments accountant
The moments accountant is a mathematical tool used in DP-SGD to track and tightly bound the cumulative privacy loss (privacy budget) over many training steps. It allows for more accurate accounting of privacy guarantees than naïve composition, especially when training for many epochs.

12345678910111213141516171819202122232425262728293031
import numpy as np # Simulate a batch of per-example gradients (5 samples, 3 parameters each) per_example_grads = np.array([ [0.5, 2.0, 1.5], [1.2, -0.7, 0.3], [2.1, 0.0, -1.0], [-1.5, 0.9, 0.8], [0.4, -2.2, 0.5] ]) clip_norm = 1.0 noise_stddev = 0.5 # Step 1: Gradient clipping def clip_gradients(grads, clip_norm): norms = np.linalg.norm(grads, axis=1, keepdims=True) scaling_factors = np.minimum(1.0, clip_norm / (norms + 1e-6)) return grads * scaling_factors clipped_grads = clip_gradients(per_example_grads, clip_norm) # Step 2: Average the clipped gradients avg_clipped_grad = np.mean(clipped_grads, axis=0) # Step 3: Add Gaussian noise noisy_grad = avg_clipped_grad + np.random.normal(0, noise_stddev, size=avg_clipped_grad.shape) print("Clipped gradients:\n", clipped_grads) print("Averaged clipped gradient:", avg_clipped_grad) print("Noisy (DP) gradient:", noisy_grad)
copy

1. What is the main purpose of gradient clipping in DP-SGD?

2. Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?

question mark

What is the main purpose of gradient clipping in DP-SGD?

Select the correct answer

question mark

Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

bookDifferentially Private Stochastic Gradient Descent (DP-SGD)

Свайпніть щоб показати меню

Differentially Private Stochastic Gradient Descent, or DP-SGD, is a powerful approach for training machine learning models while providing formal privacy guarantees for the data used. DP-SGD modifies the standard stochastic gradient descent (SGD) algorithm by introducing two key steps: gradient clipping and noise addition. These steps ensure that the contribution of any single data point to the model update is limited, and the overall training process is differentially private.

Differential privacy is formally defined as follows: a randomized algorithm A \mathcal{A} is (ϵ,δ)(\epsilon, \delta)-differentially private if, for all datasets DD and DD' differing on a single entry and for all measurable sets SS,

Pr[A(D)S]eϵPr[A(D)S]+δ\Pr[\mathcal{A}(D) \in S] \leq e^{\epsilon} \Pr[\mathcal{A}(D') \in S] + \delta

In DP-SGD, during each minibatch update, the gradient computed for each individual data sample gig_i is first clipped to a fixed norm CC. This gradient clipping step ensures that no single sample can have a disproportionate influence on the model update, which is crucial for bounding the sensitivity of the training process. The clipped gradient for sample ii is:

g~i=gimin(1,Cgi2)\tilde{g}_i = g_i \cdot \min\left(1, \frac{C}{\|g_i\|_2}\right)

After clipping, the gradients are averaged:

gˉ=1Li=1Lg~i\bar{g} = \frac{1}{L} \sum_{i=1}^L \tilde{g}_i

where LL is the minibatch size. Then, Gaussian noise is added to the averaged gradient to mask the presence or absence of any single data point. The noisy gradient is:

g^=gˉ+N(0,σ2C2I)\hat{g} = \bar{g} + \mathcal{N}(0, \sigma^2 C^2 I)

Here, σ\sigma is the noise multiplier, and N(0,σ2C2I)\mathcal{N}(0, \sigma^2 C^2 I) denotes Gaussian noise with mean zero and standard deviation σC\sigma C per coordinate. This mechanism achieves (ϵ,δ)(\epsilon, \delta)-differential privacy guarantees for the training process.

To keep track of the cumulative privacy loss over multiple training steps, DP-SGD uses a technique called the moments accountant.

Note
Definition

Moments accountant
The moments accountant is a mathematical tool used in DP-SGD to track and tightly bound the cumulative privacy loss (privacy budget) over many training steps. It allows for more accurate accounting of privacy guarantees than naïve composition, especially when training for many epochs.

12345678910111213141516171819202122232425262728293031
import numpy as np # Simulate a batch of per-example gradients (5 samples, 3 parameters each) per_example_grads = np.array([ [0.5, 2.0, 1.5], [1.2, -0.7, 0.3], [2.1, 0.0, -1.0], [-1.5, 0.9, 0.8], [0.4, -2.2, 0.5] ]) clip_norm = 1.0 noise_stddev = 0.5 # Step 1: Gradient clipping def clip_gradients(grads, clip_norm): norms = np.linalg.norm(grads, axis=1, keepdims=True) scaling_factors = np.minimum(1.0, clip_norm / (norms + 1e-6)) return grads * scaling_factors clipped_grads = clip_gradients(per_example_grads, clip_norm) # Step 2: Average the clipped gradients avg_clipped_grad = np.mean(clipped_grads, axis=0) # Step 3: Add Gaussian noise noisy_grad = avg_clipped_grad + np.random.normal(0, noise_stddev, size=avg_clipped_grad.shape) print("Clipped gradients:\n", clipped_grads) print("Averaged clipped gradient:", avg_clipped_grad) print("Noisy (DP) gradient:", noisy_grad)
copy

1. What is the main purpose of gradient clipping in DP-SGD?

2. Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?

question mark

What is the main purpose of gradient clipping in DP-SGD?

Select the correct answer

question mark

Which statement best describes the trade-off in DP-SGD between model accuracy and privacy?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1
some-alt