Apprendre Monitoring Model Degradation

Glissez pour afficher le menu

Model degradation refers to the decline in a machine learning model's predictive performance over time. This phenomenon is often linked to drift, which is the change in the distribution of input data that the model receives compared to the data it was trained on. When drift occurs, the relationship between input features and target variables may shift, causing the model to make less accurate predictions. Two common metrics used to quantify model performance are accuracy and AUC (Area Under the ROC Curve). A decrease in either metric signals potential model degradation, which may be a direct result of drift in the underlying data.


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score

np.random.seed(0)

# Generate initial training data (no drift)
X_train, y_train = make_classification(n_samples=1000, n_features=10, n_informative=8, n_redundant=2, flip_y=0.01)

# Train a simple model
model = LogisticRegression()
model.fit(X_train, y_train)

# Simulate performance over 10 time periods with increasing drift
accuracies = []
aucs = []

for t in range(10):
    # Introduce drift by shifting feature means over time
    drift_strength = t * 0.2
    X_test, y_test = make_classification(
        n_samples=300,
        n_features=10,
        n_informative=8,
        n_redundant=2,
        flip_y=0.01,
        shift=drift_strength
    )
    y_pred = model.predict(X_test)
    y_proba = model.predict_proba(X_test)[:, 1]
    accuracies.append(accuracy_score(y_test, y_pred))
    aucs.append(roc_auc_score(y_test, y_proba))

# Plot accuracy and AUC over time
plt.figure(figsize=(8, 5))
plt.plot(range(10), accuracies, marker='o', label='Accuracy')
plt.plot(range(10), aucs, marker='s', label='AUC')
plt.xlabel('Time Period')
plt.ylabel('Metric Value')
plt.title('Model Performance Decay Due to Drift')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Note

Drift refers to changes in the distribution of input features, while degradation describes the resulting drop in model performance. Drift can occur without immediate degradation if the model is robust, but persistent or severe drift usually leads to performance degradation.

To detect model degradation early, you should regularly monitor key model metrics such as accuracy and AUC on fresh incoming data. Sudden or gradual declines in these metrics can serve as early warning signs of drift affecting your model. By tracking these values over time, you can spot patterns that indicate when the model is no longer performing as expected. Interpreting these trends allows you to take corrective actions, such as retraining the model with updated data or investigating the source of drift, before the degradation has a significant impact on business outcomes.

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 2

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 3. Chapitre 2