Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Model-Based Detection | Model-Based Monitoring
Feature Drift and Data Drift Detection

bookModel-Based Detection

Model-based drift detection is a practical way to identify changes in data distributions using machine learning classifiers.

  • Train a classifier—such as logistic regression—to distinguish between reference (historical) data and current (incoming) data;
  • Combine both datasets and assign labels: 0 for reference and 1 for current;
  • The classifier learns to detect systematic differences between the two groups;
  • If the classifier separates the datasets well, it indicates a shift or drift has occurred between the distributions.

This model-based approach is especially valuable for complex or high-dimensional data, where traditional statistical tests may not be sensitive enough to detect subtle changes.

12345678910111213141516171819202122232425262728293031323334
import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, accuracy_score from sklearn.model_selection import train_test_split # Generate synthetic reference data (normal distribution) np.random.seed(42) reference = np.random.normal(loc=0, scale=1, size=(500, 2)) reference_labels = np.zeros(reference.shape[0]) # Generate synthetic current data (drifted: shifted mean) current = np.random.normal(loc=1.5, scale=1, size=(500, 2)) current_labels = np.ones(current.shape[0]) # Combine datasets X = np.vstack([reference, current]) y = np.concatenate([reference_labels, current_labels]) # Split into train/test for the drift detector X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Fit logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Predict and evaluate y_pred = clf.predict(X_test) y_pred_proba = clf.predict_proba(X_test)[:, 1] accuracy = accuracy_score(y_test, y_pred) auc = roc_auc_score(y_test, y_pred_proba) print("Drift Detector Accuracy:", accuracy) print("Drift Detector AUC:", auc)
copy

The performance of the classifier—measured by metrics such as accuracy and AUC (Area Under the ROC Curve)—directly reflects the presence of drift:

  • If the classifier achieves high accuracy or high AUC when distinguishing between reference and current data; it indicates the two distributions are different enough for the model to separate them;
  • If the classifier performs close to random guessing (accuracy or AUC near 0.5); it suggests little to no detectable drift.

This model-based approach provides a flexible and scalable way to monitor for distribution shifts, especially in complex or high-dimensional feature spaces where traditional statistical tests may fail to capture subtle changes.

Aufgabe

Swipe to start coding

You're given two unlabeled datasets from different periods: a reference sample and a current sample. Treat “dataset origin” as a binary label (0 = reference, 1 = current). Train a simple classifier to predict origin; if the model separates them well, distribution shift is likely.

Steps:

  1. Generate synthetic ref and new data (given).
  2. Build domain labels y_domain (0 for ref, 1 for new) and stack into X.
  3. Split into train/test (test_size=0.3, random_state=42).
  4. Train LogisticRegression(max_iter=1000, random_state=0).
  5. Get probabilities on test set; compute auc_score = roc_auc_score(...).
  6. Set drift_detected = (auc_score >= 0.65) and print shapes, AUC, flag.

Lösung

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 1
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain how to interpret the accuracy and AUC values in this context?

What are some limitations of using model-based drift detection?

How can I apply this approach to my own dataset?

close

Awesome!

Completion rate improved to 11.11

bookModel-Based Detection

Swipe um das Menü anzuzeigen

Model-based drift detection is a practical way to identify changes in data distributions using machine learning classifiers.

  • Train a classifier—such as logistic regression—to distinguish between reference (historical) data and current (incoming) data;
  • Combine both datasets and assign labels: 0 for reference and 1 for current;
  • The classifier learns to detect systematic differences between the two groups;
  • If the classifier separates the datasets well, it indicates a shift or drift has occurred between the distributions.

This model-based approach is especially valuable for complex or high-dimensional data, where traditional statistical tests may not be sensitive enough to detect subtle changes.

12345678910111213141516171819202122232425262728293031323334
import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, accuracy_score from sklearn.model_selection import train_test_split # Generate synthetic reference data (normal distribution) np.random.seed(42) reference = np.random.normal(loc=0, scale=1, size=(500, 2)) reference_labels = np.zeros(reference.shape[0]) # Generate synthetic current data (drifted: shifted mean) current = np.random.normal(loc=1.5, scale=1, size=(500, 2)) current_labels = np.ones(current.shape[0]) # Combine datasets X = np.vstack([reference, current]) y = np.concatenate([reference_labels, current_labels]) # Split into train/test for the drift detector X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Fit logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Predict and evaluate y_pred = clf.predict(X_test) y_pred_proba = clf.predict_proba(X_test)[:, 1] accuracy = accuracy_score(y_test, y_pred) auc = roc_auc_score(y_test, y_pred_proba) print("Drift Detector Accuracy:", accuracy) print("Drift Detector AUC:", auc)
copy

The performance of the classifier—measured by metrics such as accuracy and AUC (Area Under the ROC Curve)—directly reflects the presence of drift:

  • If the classifier achieves high accuracy or high AUC when distinguishing between reference and current data; it indicates the two distributions are different enough for the model to separate them;
  • If the classifier performs close to random guessing (accuracy or AUC near 0.5); it suggests little to no detectable drift.

This model-based approach provides a flexible and scalable way to monitor for distribution shifts, especially in complex or high-dimensional feature spaces where traditional statistical tests may fail to capture subtle changes.

Aufgabe

Swipe to start coding

You're given two unlabeled datasets from different periods: a reference sample and a current sample. Treat “dataset origin” as a binary label (0 = reference, 1 = current). Train a simple classifier to predict origin; if the model separates them well, distribution shift is likely.

Steps:

  1. Generate synthetic ref and new data (given).
  2. Build domain labels y_domain (0 for ref, 1 for new) and stack into X.
  3. Split into train/test (test_size=0.3, random_state=42).
  4. Train LogisticRegression(max_iter=1000, random_state=0).
  5. Get probabilities on test set; compute auc_score = roc_auc_score(...).
  6. Set drift_detected = (auc_score >= 0.65) and print shapes, AUC, flag.

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 1
single

single

some-alt