Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Kullback–Leibler (KL) Divergence: Information-Theoretic Loss | Advanced and Specialized Losses
Understanding Loss Functions in Machine Learning

bookKullback–Leibler (KL) Divergence: Information-Theoretic Loss

The Kullback–Leibler (KL) divergence is a fundamental concept in information theory and machine learning, measuring how one probability distribution diverges from a second, expected distribution. Mathematically, it is defined as:

DKL(PQ)=iP(i)logP(i)Q(i)D_{KL}(P \| Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)}

Here, PP and QQ are discrete probability distributions over the same set of events or outcomes, and the sum runs over all possible events ii. In this formula, P(i)P(i) represents the true probability of event ii, while Q(i)Q(i) represents the probability assigned to event ii by the model or approximation.

Note
Note

KL divergence quantifies the inefficiency of assuming QQ when the true distribution is PP. It can be interpreted as the extra number of bits needed to encode samples from PP using a code optimized for QQ instead of the optimal code for PP.

KL divergence has several important properties. First, it is asymmetric:

DKL(PQ)DKL(QP)D_{KL}(P \| Q) \neq D_{KL}(Q \| P)

This means the divergence from PP to QQ is not the same as from QQ to PP, reflecting that the "cost" of assuming QQ when PP is true is not the same as the reverse.

Second, KL divergence is non-negative:

DKL(PQ)0D_{KL}(P \| Q) \geq 0

for all valid probability distributions PP and QQ, with equality if and only if P=QP = Q everywhere.

In machine learning, KL divergence is widely used as a loss function, especially in scenarios involving probability distributions. It plays a central role in variational inference, where it measures how close an approximate distribution is to the true posterior. Additionally, KL divergence often appears as a regularization term in models that seek to prevent overfitting by encouraging distributions predicted by the model to remain close to a prior or reference distribution.

question mark

Which of the following statements about KL divergence are true?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 6.67

bookKullback–Leibler (KL) Divergence: Information-Theoretic Loss

Swipe um das Menü anzuzeigen

The Kullback–Leibler (KL) divergence is a fundamental concept in information theory and machine learning, measuring how one probability distribution diverges from a second, expected distribution. Mathematically, it is defined as:

DKL(PQ)=iP(i)logP(i)Q(i)D_{KL}(P \| Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)}

Here, PP and QQ are discrete probability distributions over the same set of events or outcomes, and the sum runs over all possible events ii. In this formula, P(i)P(i) represents the true probability of event ii, while Q(i)Q(i) represents the probability assigned to event ii by the model or approximation.

Note
Note

KL divergence quantifies the inefficiency of assuming QQ when the true distribution is PP. It can be interpreted as the extra number of bits needed to encode samples from PP using a code optimized for QQ instead of the optimal code for PP.

KL divergence has several important properties. First, it is asymmetric:

DKL(PQ)DKL(QP)D_{KL}(P \| Q) \neq D_{KL}(Q \| P)

This means the divergence from PP to QQ is not the same as from QQ to PP, reflecting that the "cost" of assuming QQ when PP is true is not the same as the reverse.

Second, KL divergence is non-negative:

DKL(PQ)0D_{KL}(P \| Q) \geq 0

for all valid probability distributions PP and QQ, with equality if and only if P=QP = Q everywhere.

In machine learning, KL divergence is widely used as a loss function, especially in scenarios involving probability distributions. It plays a central role in variational inference, where it measures how close an approximate distribution is to the true posterior. Additionally, KL divergence often appears as a regularization term in models that seek to prevent overfitting by encouraging distributions predicted by the model to remain close to a prior or reference distribution.

question mark

Which of the following statements about KL divergence are true?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1
some-alt