Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Monitoring Model and Data Drift | Monitoring and Continuous Delivery
MLOps for Machine Learning Engineers

bookMonitoring Model and Data Drift

Machine learning models in production face a dynamic environment where both the data and the underlying business context can change over time. Two key phenomena to watch for are model drift and data drift.

Model drift refers to the decline in model performance as the relationship between input features and the target variable changes. There are two main types of model drift:

  • Concept drift: the statistical relationship between features and the target variable changes over time; this means the model's underlying assumptions no longer hold, so predictions become less accurate;
  • Performance drift: the model's accuracy or other evaluation metrics degrade, even if the feature-target relationship appears stable; this can result from changes in external factors or evolving business objectives.

Data drift, on the other hand, occurs when the distribution of input data itself shifts from what the model was originally trained on. Data drift can be categorized as:

  • Covariate drift: the distribution of input features changes, but the relationship between features and target remains the same;
  • Prior probability drift: the distribution of the target variable changes, such as a shift in the proportion of classes in classification problems;
  • Feature distribution drift: specific input features experience changes in their statistical properties, such as mean or variance, which may impact model predictions.

Monitoring for these changes is essential: if you do not detect drift, your model's predictions may become unreliable, leading to poor business outcomes or even critical failures in automated decision systems. Effective monitoring lets you catch these issues early and trigger retraining, model updates, or deeper investigations as needed.

Note
Definition

Model drift occurs when a model's performance degrades due to changes in data distribution.

123456789101112131415161718192021222324252627
import numpy as np import matplotlib.pyplot as plt from scipy.stats import ks_2samp # Simulated training data and recent production data np.random.seed(42) training_feature = np.random.normal(loc=0, scale=1, size=1000) recent_feature = np.random.normal(loc=0.5, scale=1.2, size=1000) # Plot distributions plt.figure(figsize=(10, 5)) plt.hist(training_feature, bins=30, alpha=0.5, label="Training Data", density=True) plt.hist(recent_feature, bins=30, alpha=0.5, label="Recent Data", density=True) plt.legend() plt.title("Feature Distribution: Training vs. Recent Data") plt.xlabel("Feature Value") plt.ylabel("Density") plt.show() # Use Kolmogorov-Smirnov test to compare distributions statistic, p_value = ks_2samp(training_feature, recent_feature) print(f"KS Statistic: {statistic:.3f}, p-value: {p_value:.3f}") if p_value < 0.05: print("Significant data drift detected.") else: print("No significant data drift detected.")
copy
question mark

Which statement best describes the differences between concept drift, performance drift, covariate drift, and prior probability drift?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 5. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how the Kolmogorov-Smirnov test works for detecting data drift?

What are some other methods to monitor data or model drift in production?

How should I respond if significant data drift is detected in my model?

Awesome!

Completion rate improved to 6.25

bookMonitoring Model and Data Drift

Swipe to show menu

Machine learning models in production face a dynamic environment where both the data and the underlying business context can change over time. Two key phenomena to watch for are model drift and data drift.

Model drift refers to the decline in model performance as the relationship between input features and the target variable changes. There are two main types of model drift:

  • Concept drift: the statistical relationship between features and the target variable changes over time; this means the model's underlying assumptions no longer hold, so predictions become less accurate;
  • Performance drift: the model's accuracy or other evaluation metrics degrade, even if the feature-target relationship appears stable; this can result from changes in external factors or evolving business objectives.

Data drift, on the other hand, occurs when the distribution of input data itself shifts from what the model was originally trained on. Data drift can be categorized as:

  • Covariate drift: the distribution of input features changes, but the relationship between features and target remains the same;
  • Prior probability drift: the distribution of the target variable changes, such as a shift in the proportion of classes in classification problems;
  • Feature distribution drift: specific input features experience changes in their statistical properties, such as mean or variance, which may impact model predictions.

Monitoring for these changes is essential: if you do not detect drift, your model's predictions may become unreliable, leading to poor business outcomes or even critical failures in automated decision systems. Effective monitoring lets you catch these issues early and trigger retraining, model updates, or deeper investigations as needed.

Note
Definition

Model drift occurs when a model's performance degrades due to changes in data distribution.

123456789101112131415161718192021222324252627
import numpy as np import matplotlib.pyplot as plt from scipy.stats import ks_2samp # Simulated training data and recent production data np.random.seed(42) training_feature = np.random.normal(loc=0, scale=1, size=1000) recent_feature = np.random.normal(loc=0.5, scale=1.2, size=1000) # Plot distributions plt.figure(figsize=(10, 5)) plt.hist(training_feature, bins=30, alpha=0.5, label="Training Data", density=True) plt.hist(recent_feature, bins=30, alpha=0.5, label="Recent Data", density=True) plt.legend() plt.title("Feature Distribution: Training vs. Recent Data") plt.xlabel("Feature Value") plt.ylabel("Density") plt.show() # Use Kolmogorov-Smirnov test to compare distributions statistic, p_value = ks_2samp(training_feature, recent_feature) print(f"KS Statistic: {statistic:.3f}, p-value: {p_value:.3f}") if p_value < 0.05: print("Significant data drift detected.") else: print("No significant data drift detected.")
copy
question mark

Which statement best describes the differences between concept drift, performance drift, covariate drift, and prior probability drift?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 5. ChapterΒ 1
some-alt