Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Kullback–Leibler (KL) Divergence: Information-Theoretic Loss | Advanced and Specialized Losses
Understanding Loss Functions in Machine Learning

bookKullback–Leibler (KL) Divergence: Information-Theoretic Loss

The Kullback–Leibler (KL) divergence is a fundamental concept in information theory and machine learning, measuring how one probability distribution diverges from a second, expected distribution. Mathematically, it is defined as:

DKL(PQ)=iP(i)logP(i)Q(i)D_{KL}(P \| Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)}

Here, PP and QQ are discrete probability distributions over the same set of events or outcomes, and the sum runs over all possible events ii. In this formula, P(i)P(i) represents the true probability of event ii, while Q(i)Q(i) represents the probability assigned to event ii by the model or approximation.

Note
Note

KL divergence quantifies the inefficiency of assuming QQ when the true distribution is PP. It can be interpreted as the extra number of bits needed to encode samples from PP using a code optimized for QQ instead of the optimal code for PP.

KL divergence has several important properties. First, it is asymmetric:

DKL(PQ)DKL(QP)D_{KL}(P \| Q) \neq D_{KL}(Q \| P)

This means the divergence from PP to QQ is not the same as from QQ to PP, reflecting that the "cost" of assuming QQ when PP is true is not the same as the reverse.

Second, KL divergence is non-negative:

DKL(PQ)0D_{KL}(P \| Q) \geq 0

for all valid probability distributions PP and QQ, with equality if and only if P=QP = Q everywhere.

In machine learning, KL divergence is widely used as a loss function, especially in scenarios involving probability distributions. It plays a central role in variational inference, where it measures how close an approximate distribution is to the true posterior. Additionally, KL divergence often appears as a regularization term in models that seek to prevent overfitting by encouraging distributions predicted by the model to remain close to a prior or reference distribution.

question mark

Which of the following statements about KL divergence are true?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 1

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Awesome!

Completion rate improved to 6.67

bookKullback–Leibler (KL) Divergence: Information-Theoretic Loss

Glissez pour afficher le menu

The Kullback–Leibler (KL) divergence is a fundamental concept in information theory and machine learning, measuring how one probability distribution diverges from a second, expected distribution. Mathematically, it is defined as:

DKL(PQ)=iP(i)logP(i)Q(i)D_{KL}(P \| Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)}

Here, PP and QQ are discrete probability distributions over the same set of events or outcomes, and the sum runs over all possible events ii. In this formula, P(i)P(i) represents the true probability of event ii, while Q(i)Q(i) represents the probability assigned to event ii by the model or approximation.

Note
Note

KL divergence quantifies the inefficiency of assuming QQ when the true distribution is PP. It can be interpreted as the extra number of bits needed to encode samples from PP using a code optimized for QQ instead of the optimal code for PP.

KL divergence has several important properties. First, it is asymmetric:

DKL(PQ)DKL(QP)D_{KL}(P \| Q) \neq D_{KL}(Q \| P)

This means the divergence from PP to QQ is not the same as from QQ to PP, reflecting that the "cost" of assuming QQ when PP is true is not the same as the reverse.

Second, KL divergence is non-negative:

DKL(PQ)0D_{KL}(P \| Q) \geq 0

for all valid probability distributions PP and QQ, with equality if and only if P=QP = Q everywhere.

In machine learning, KL divergence is widely used as a loss function, especially in scenarios involving probability distributions. It plays a central role in variational inference, where it measures how close an approximate distribution is to the true posterior. Additionally, KL divergence often appears as a regularization term in models that seek to prevent overfitting by encouraging distributions predicted by the model to remain close to a prior or reference distribution.

question mark

Which of the following statements about KL divergence are true?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 1
some-alt