Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Jensen–Shannon Divergence and GANs | Advanced and Specialized Losses
Understanding Loss Functions in Machine Learning

bookJensen–Shannon Divergence and GANs

The Jensen–Shannon (JS) divergence is a fundamental concept in information theory and machine learning, especially in the context of generative adversarial networks (GANs). It builds upon the Kullback–Leibler (KL) divergence by introducing a symmetrized and smoothed measure of the difference between two probability distributions. The formula for the Jensen–Shannon divergence is:

JS(PQ)=12DKL(PM)+12DKL(QM)JS(P \| Q) = \frac{1}{2} D_{KL}(P \| M) + \frac{1}{2} D_{KL}(Q \| M)

where M=12(P+Q)M = \frac{1}{2}(P+Q), and DKLD_{KL} denotes the KL divergence.

Note
Note

Jensen–Shannon divergence is symmetric and bounded, making it suitable for adversarial learning.

Symmetry means that JS(PQ)=JS(QP)JS(P \| Q) = JS(Q \| P), unlike KL divergence, which is asymmetric. The JS divergence is also always bounded between 0 and log2\log 2 (or 1, depending on the logarithm base), making it numerically stable and easier to interpret in optimization problems.

In the context of GANs, the generator and discriminator are engaged in a minimax game: the generator tries to produce samples that are indistinguishable from the real data, while the discriminator tries to tell them apart. The original GAN formulation uses the JS divergence as the theoretical basis for its loss function. This choice is motivated by the fact that JS divergence provides a meaningful, symmetric measure of the overlap between the generated and real data distributions. If the distributions do not overlap, the JS divergence reaches its maximum, signaling the generator to produce more realistic samples. Conversely, as the generator improves, the divergence decreases, guiding the model towards better performance.

However, the bounded and symmetric nature of JS divergence can also lead to unique training dynamics. When the real and generated data distributions have little or no overlap, the gradient provided by the JS divergence can vanish, making it difficult for the generator to improve. This phenomenon is known as the vanishing gradient problem in GANs. Despite this limitation, the JS divergence remains a foundational concept in understanding how GANs measure the distance between probability distributions and why certain loss functions are chosen for adversarial learning.

question mark

Which of the following best describes the difference between KL divergence and JS divergence, and why is JS divergence often preferred in GANs?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 4. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain the difference between KL divergence and JS divergence in more detail?

How does the vanishing gradient problem affect GAN training in practice?

Are there alternative divergence measures used in GANs to address the vanishing gradient issue?

Awesome!

Completion rate improved to 6.67

bookJensen–Shannon Divergence and GANs

Deslize para mostrar o menu

The Jensen–Shannon (JS) divergence is a fundamental concept in information theory and machine learning, especially in the context of generative adversarial networks (GANs). It builds upon the Kullback–Leibler (KL) divergence by introducing a symmetrized and smoothed measure of the difference between two probability distributions. The formula for the Jensen–Shannon divergence is:

JS(PQ)=12DKL(PM)+12DKL(QM)JS(P \| Q) = \frac{1}{2} D_{KL}(P \| M) + \frac{1}{2} D_{KL}(Q \| M)

where M=12(P+Q)M = \frac{1}{2}(P+Q), and DKLD_{KL} denotes the KL divergence.

Note
Note

Jensen–Shannon divergence is symmetric and bounded, making it suitable for adversarial learning.

Symmetry means that JS(PQ)=JS(QP)JS(P \| Q) = JS(Q \| P), unlike KL divergence, which is asymmetric. The JS divergence is also always bounded between 0 and log2\log 2 (or 1, depending on the logarithm base), making it numerically stable and easier to interpret in optimization problems.

In the context of GANs, the generator and discriminator are engaged in a minimax game: the generator tries to produce samples that are indistinguishable from the real data, while the discriminator tries to tell them apart. The original GAN formulation uses the JS divergence as the theoretical basis for its loss function. This choice is motivated by the fact that JS divergence provides a meaningful, symmetric measure of the overlap between the generated and real data distributions. If the distributions do not overlap, the JS divergence reaches its maximum, signaling the generator to produce more realistic samples. Conversely, as the generator improves, the divergence decreases, guiding the model towards better performance.

However, the bounded and symmetric nature of JS divergence can also lead to unique training dynamics. When the real and generated data distributions have little or no overlap, the gradient provided by the JS divergence can vanish, making it difficult for the generator to improve. This phenomenon is known as the vanishing gradient problem in GANs. Despite this limitation, the JS divergence remains a foundational concept in understanding how GANs measure the distance between probability distributions and why certain loss functions are chosen for adversarial learning.

question mark

Which of the following best describes the difference between KL divergence and JS divergence, and why is JS divergence often preferred in GANs?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 4. Capítulo 2
some-alt