Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer What Is Drift | Understanding Drift
Feature Drift and Data Drift Detection

bookWhat Is Drift

In machine learning, drift refers to a change in the underlying data or relationships that a model relies on to make predictions. There are three main types of drift you should understand: data drift, feature drift, and concept drift.

Note
Definition

Data drift is a broad term that describes any change in the statistical properties of the input data over time. This might mean the overall distribution of the dataset has shifted, which can affect model performance even if the relationships between features and targets remain the same.

Note
Definition

Feature drift is a more specific case where the distribution of one or more individual features changes. For example, the average age of customers in your dataset might increase over time, or the range of values for a sensor reading might shift.

Note
Definition

Concept drift occurs when the relationship between input features and the target variable changes. This means that even if the input data appears similar, the way it maps to the output has changed. For instance, if a model predicts whether an email is spam, but spammers start using new tactics, the features that once indicated spam may no longer be reliable.

Understanding the differences between these types of drift is crucial for maintaining reliable machine learning pipelines. If you do not monitor for drift, your models can become less accurate, leading to poor decisions and outcomes.

Note
Note

Common causes of drift include:

  • Temporal changes: data naturally evolves over time;
  • Sampling bias: data collection methods or sources change, introducing new patterns;
  • Behavioral shifts: users, customers, or systems change their behavior, leading to new data trends.
12345678910111213141516
import numpy as np import matplotlib.pyplot as plt # Generate synthetic feature data for two time periods np.random.seed(42) feature_period1 = np.random.normal(loc=50, scale=5, size=1000) feature_period2 = np.random.normal(loc=55, scale=7, size=1000) plt.figure(figsize=(8, 5)) plt.hist(feature_period1, bins=30, alpha=0.6, label="Period 1", color="blue", density=True) plt.hist(feature_period2, bins=30, alpha=0.6, label="Period 2", color="orange", density=True) plt.title("Feature Distribution Over Time") plt.xlabel("Feature Value") plt.ylabel("Density") plt.legend() plt.show()
copy

You can often spot feature drift by visually comparing feature distributions from different time periods, as in the plot above. If the shapes, centers, or spreads of the distributions change noticeably, this is a strong indicator of drift. For example, if the histogram for "Period 2" is shifted to the right and has a wider spread than "Period 1", it means the feature's average value and variability have both changed. Such changes can impact your model's predictions and may require retraining or adjustment.

question mark

Which scenario best describes concept drift?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Awesome!

Completion rate improved to 11.11

bookWhat Is Drift

Veeg om het menu te tonen

In machine learning, drift refers to a change in the underlying data or relationships that a model relies on to make predictions. There are three main types of drift you should understand: data drift, feature drift, and concept drift.

Note
Definition

Data drift is a broad term that describes any change in the statistical properties of the input data over time. This might mean the overall distribution of the dataset has shifted, which can affect model performance even if the relationships between features and targets remain the same.

Note
Definition

Feature drift is a more specific case where the distribution of one or more individual features changes. For example, the average age of customers in your dataset might increase over time, or the range of values for a sensor reading might shift.

Note
Definition

Concept drift occurs when the relationship between input features and the target variable changes. This means that even if the input data appears similar, the way it maps to the output has changed. For instance, if a model predicts whether an email is spam, but spammers start using new tactics, the features that once indicated spam may no longer be reliable.

Understanding the differences between these types of drift is crucial for maintaining reliable machine learning pipelines. If you do not monitor for drift, your models can become less accurate, leading to poor decisions and outcomes.

Note
Note

Common causes of drift include:

  • Temporal changes: data naturally evolves over time;
  • Sampling bias: data collection methods or sources change, introducing new patterns;
  • Behavioral shifts: users, customers, or systems change their behavior, leading to new data trends.
12345678910111213141516
import numpy as np import matplotlib.pyplot as plt # Generate synthetic feature data for two time periods np.random.seed(42) feature_period1 = np.random.normal(loc=50, scale=5, size=1000) feature_period2 = np.random.normal(loc=55, scale=7, size=1000) plt.figure(figsize=(8, 5)) plt.hist(feature_period1, bins=30, alpha=0.6, label="Period 1", color="blue", density=True) plt.hist(feature_period2, bins=30, alpha=0.6, label="Period 2", color="orange", density=True) plt.title("Feature Distribution Over Time") plt.xlabel("Feature Value") plt.ylabel("Density") plt.legend() plt.show()
copy

You can often spot feature drift by visually comparing feature distributions from different time periods, as in the plot above. If the shapes, centers, or spreads of the distributions change noticeably, this is a strong indicator of drift. For example, if the histogram for "Period 2" is shifted to the right and has a wider spread than "Period 1", it means the feature's average value and variability have both changed. Such changes can impact your model's predictions and may require retraining or adjustment.

question mark

Which scenario best describes concept drift?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1
some-alt