Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Drift Metrics | Understanding Drift
Feature Drift and Data Drift Detection

bookDrift Metrics

When monitoring for drift in data science workflows, you rely on quantitative metrics to assess whether the distribution of your features or model outputs has changed over time. Three core drift metrics are commonly used: Kullback-Leibler (KL) divergence, Population Stability Index (PSI), and the Kolmogorov–Smirnov (KS) test. Each metric captures a different aspect of how distributions diverge, and understanding their mathematical intuition will help you choose the right tool for your scenario.

KL divergence measures how one probability distribution differs from a reference distribution. It is asymmetric and quantifies the information lost when you approximate the true distribution with another. PSI is widely used in credit scoring and business analytics; it quantifies how much a variable's distribution has shifted between a baseline and a new dataset, typically by grouping values into bins and summing the differences in proportions. KS test is a non-parametric test that compares the cumulative distributions of two samples, focusing on the largest difference between their empirical cumulative distribution functions (ECDFs). While KL divergence and PSI provide a single summary statistic, the KS test yields both a statistic and a p-value, helping you assess statistical significance.

12345678910111213
import numpy as np # Define two synthetic probability distributions p = np.array([0.1, 0.4, 0.5]) q = np.array([0.2, 0.3, 0.5]) # Ensure no zero probabilities to avoid division errors p = np.where(p == 0, 1e-10, p) q = np.where(q == 0, 1e-10, q) # Compute KL divergence: sum(p * log(p/q)) kl_divergence = np.sum(p * np.log(p / q)) print("KL Divergence:", kl_divergence)
copy

To interpret these metrics in practice, you need to set thresholds that indicate when drift is significant enough to warrant action. For KL divergence, values close to zero suggest the distributions are similar, while higher values indicate more divergence; practical thresholds depend on context, but values above 0.1–0.5 often suggest notable drift. For PSI, a value below 0.1 typically means no drift, 0.1–0.25 indicates moderate drift, and above 0.25 signals significant drift. The KS test produces a statistic between 0 and 1, with higher values reflecting greater divergence; a p-value below 0.05 usually means the difference is statistically significant.

When using these metrics, always consider the context and the business impact of drift. Small changes may be acceptable in some settings, while even minor shifts could be critical in high-stakes applications.

Note
Note

Feature drift metrics, such as KL divergence, PSI, and KS test, measure changes in the input data distribution over time. Model drift metrics, on the other hand, assess changes in the performance or predictions of your model, such as accuracy or AUC degradation. It's important to monitor both types to maintain model reliability.

question mark

Which drift metric is commonly used to quantify the information lost when approximating one probability distribution with another?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 3

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Awesome!

Completion rate improved to 11.11

bookDrift Metrics

Desliza para mostrar el menú

When monitoring for drift in data science workflows, you rely on quantitative metrics to assess whether the distribution of your features or model outputs has changed over time. Three core drift metrics are commonly used: Kullback-Leibler (KL) divergence, Population Stability Index (PSI), and the Kolmogorov–Smirnov (KS) test. Each metric captures a different aspect of how distributions diverge, and understanding their mathematical intuition will help you choose the right tool for your scenario.

KL divergence measures how one probability distribution differs from a reference distribution. It is asymmetric and quantifies the information lost when you approximate the true distribution with another. PSI is widely used in credit scoring and business analytics; it quantifies how much a variable's distribution has shifted between a baseline and a new dataset, typically by grouping values into bins and summing the differences in proportions. KS test is a non-parametric test that compares the cumulative distributions of two samples, focusing on the largest difference between their empirical cumulative distribution functions (ECDFs). While KL divergence and PSI provide a single summary statistic, the KS test yields both a statistic and a p-value, helping you assess statistical significance.

12345678910111213
import numpy as np # Define two synthetic probability distributions p = np.array([0.1, 0.4, 0.5]) q = np.array([0.2, 0.3, 0.5]) # Ensure no zero probabilities to avoid division errors p = np.where(p == 0, 1e-10, p) q = np.where(q == 0, 1e-10, q) # Compute KL divergence: sum(p * log(p/q)) kl_divergence = np.sum(p * np.log(p / q)) print("KL Divergence:", kl_divergence)
copy

To interpret these metrics in practice, you need to set thresholds that indicate when drift is significant enough to warrant action. For KL divergence, values close to zero suggest the distributions are similar, while higher values indicate more divergence; practical thresholds depend on context, but values above 0.1–0.5 often suggest notable drift. For PSI, a value below 0.1 typically means no drift, 0.1–0.25 indicates moderate drift, and above 0.25 signals significant drift. The KS test produces a statistic between 0 and 1, with higher values reflecting greater divergence; a p-value below 0.05 usually means the difference is statistically significant.

When using these metrics, always consider the context and the business impact of drift. Small changes may be acceptable in some settings, while even minor shifts could be critical in high-stakes applications.

Note
Note

Feature drift metrics, such as KL divergence, PSI, and KS test, measure changes in the input data distribution over time. Model drift metrics, on the other hand, assess changes in the performance or predictions of your model, such as accuracy or AUC degradation. It's important to monitor both types to maintain model reliability.

question mark

Which drift metric is commonly used to quantify the information lost when approximating one probability distribution with another?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 3
some-alt