Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Model-Based Detection | Model-Based Monitoring
Feature Drift and Data Drift Detection

bookModel-Based Detection

Model-based drift detection is a practical way to identify changes in data distributions using machine learning classifiers.

  • Train a classifier—such as logistic regression—to distinguish between reference (historical) data and current (incoming) data;
  • Combine both datasets and assign labels: 0 for reference and 1 for current;
  • The classifier learns to detect systematic differences between the two groups;
  • If the classifier separates the datasets well, it indicates a shift or drift has occurred between the distributions.

This model-based approach is especially valuable for complex or high-dimensional data, where traditional statistical tests may not be sensitive enough to detect subtle changes.

12345678910111213141516171819202122232425262728293031323334
import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, accuracy_score from sklearn.model_selection import train_test_split # Generate synthetic reference data (normal distribution) np.random.seed(42) reference = np.random.normal(loc=0, scale=1, size=(500, 2)) reference_labels = np.zeros(reference.shape[0]) # Generate synthetic current data (drifted: shifted mean) current = np.random.normal(loc=1.5, scale=1, size=(500, 2)) current_labels = np.ones(current.shape[0]) # Combine datasets X = np.vstack([reference, current]) y = np.concatenate([reference_labels, current_labels]) # Split into train/test for the drift detector X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Fit logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Predict and evaluate y_pred = clf.predict(X_test) y_pred_proba = clf.predict_proba(X_test)[:, 1] accuracy = accuracy_score(y_test, y_pred) auc = roc_auc_score(y_test, y_pred_proba) print("Drift Detector Accuracy:", accuracy) print("Drift Detector AUC:", auc)
copy

The performance of the classifier—measured by metrics such as accuracy and AUC (Area Under the ROC Curve)—directly reflects the presence of drift:

  • If the classifier achieves high accuracy or high AUC when distinguishing between reference and current data; it indicates the two distributions are different enough for the model to separate them;
  • If the classifier performs close to random guessing (accuracy or AUC near 0.5); it suggests little to no detectable drift.

This model-based approach provides a flexible and scalable way to monitor for distribution shifts, especially in complex or high-dimensional feature spaces where traditional statistical tests may fail to capture subtle changes.

Tâche

Swipe to start coding

You're given two unlabeled datasets from different periods: a reference sample and a current sample. Treat “dataset origin” as a binary label (0 = reference, 1 = current). Train a simple classifier to predict origin; if the model separates them well, distribution shift is likely.

Steps:

  1. Generate synthetic ref and new data (given).
  2. Build domain labels y_domain (0 for ref, 1 for new) and stack into X.
  3. Split into train/test (test_size=0.3, random_state=42).
  4. Train LogisticRegression(max_iter=1000, random_state=0).
  5. Get probabilities on test set; compute auc_score = roc_auc_score(...).
  6. Set drift_detected = (auc_score >= 0.65) and print shapes, AUC, flag.

Solution

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 3. Chapitre 1
single

single

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

close

Awesome!

Completion rate improved to 11.11

bookModel-Based Detection

Glissez pour afficher le menu

Model-based drift detection is a practical way to identify changes in data distributions using machine learning classifiers.

  • Train a classifier—such as logistic regression—to distinguish between reference (historical) data and current (incoming) data;
  • Combine both datasets and assign labels: 0 for reference and 1 for current;
  • The classifier learns to detect systematic differences between the two groups;
  • If the classifier separates the datasets well, it indicates a shift or drift has occurred between the distributions.

This model-based approach is especially valuable for complex or high-dimensional data, where traditional statistical tests may not be sensitive enough to detect subtle changes.

12345678910111213141516171819202122232425262728293031323334
import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, accuracy_score from sklearn.model_selection import train_test_split # Generate synthetic reference data (normal distribution) np.random.seed(42) reference = np.random.normal(loc=0, scale=1, size=(500, 2)) reference_labels = np.zeros(reference.shape[0]) # Generate synthetic current data (drifted: shifted mean) current = np.random.normal(loc=1.5, scale=1, size=(500, 2)) current_labels = np.ones(current.shape[0]) # Combine datasets X = np.vstack([reference, current]) y = np.concatenate([reference_labels, current_labels]) # Split into train/test for the drift detector X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Fit logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Predict and evaluate y_pred = clf.predict(X_test) y_pred_proba = clf.predict_proba(X_test)[:, 1] accuracy = accuracy_score(y_test, y_pred) auc = roc_auc_score(y_test, y_pred_proba) print("Drift Detector Accuracy:", accuracy) print("Drift Detector AUC:", auc)
copy

The performance of the classifier—measured by metrics such as accuracy and AUC (Area Under the ROC Curve)—directly reflects the presence of drift:

  • If the classifier achieves high accuracy or high AUC when distinguishing between reference and current data; it indicates the two distributions are different enough for the model to separate them;
  • If the classifier performs close to random guessing (accuracy or AUC near 0.5); it suggests little to no detectable drift.

This model-based approach provides a flexible and scalable way to monitor for distribution shifts, especially in complex or high-dimensional feature spaces where traditional statistical tests may fail to capture subtle changes.

Tâche

Swipe to start coding

You're given two unlabeled datasets from different periods: a reference sample and a current sample. Treat “dataset origin” as a binary label (0 = reference, 1 = current). Train a simple classifier to predict origin; if the model separates them well, distribution shift is likely.

Steps:

  1. Generate synthetic ref and new data (given).
  2. Build domain labels y_domain (0 for ref, 1 for new) and stack into X.
  3. Split into train/test (test_size=0.3, random_state=42).
  4. Train LogisticRegression(max_iter=1000, random_state=0).
  5. Get probabilities on test set; compute auc_score = roc_auc_score(...).
  6. Set drift_detected = (auc_score >= 0.65) and print shapes, AUC, flag.

Solution

Switch to desktopPassez à un bureau pour une pratique réelleContinuez d'où vous êtes en utilisant l'une des options ci-dessous
Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 3. Chapitre 1
single

single

some-alt