Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Assessing Calibration Stability | Applied Calibration Workflows
Model Calibration with Python

bookAssessing Calibration Stability

メニューを表示するにはスワイプしてください

Calibration stability refers to how consistently a model's calibration performance holds up when evaluated on different data splits or over various time periods. In practice, you rarely have access to all possible data, so you assess your model using subsets—train/test splits or cross-validation folds. If your calibration metrics, such as Expected Calibration Error (ECE), change significantly from one split to another, this is a sign that your calibration results may not generalize well. High stability means your calibration method produces similar results across different samples, which is crucial for deploying reliable models in real-world scenarios.

1234567891011121314151617181920212223242526272829303132333435
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.calibration import CalibrationDisplay, calibration_curve from sklearn.metrics import brier_score_loss from sklearn.model_selection import train_test_split from sklearn.calibration import CalibratedClassifierCV # Create synthetic data X, y = make_classification(n_samples=2000, n_features=5, n_informative=3, random_state=42) # First random split X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.4, random_state=1) clf1 = LogisticRegression(max_iter=1000) clf1.fit(X_train1, y_train1) calibrator1 = CalibratedClassifierCV(clf1, method="isotonic", cv=3) calibrator1.fit(X_train1, y_train1) probs1 = calibrator1.predict_proba(X_test1)[:, 1] brier1 = brier_score_loss(y_test1, probs1) prob_true1, prob_pred1 = calibration_curve(y_test1, probs1, n_bins=10) ece1 = np.abs(prob_true1 - prob_pred1).mean() # Second random split X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=0.4, random_state=22) clf2 = LogisticRegression(max_iter=1000) clf2.fit(X_train2, y_train2) calibrator2 = CalibratedClassifierCV(clf2, method="isotonic", cv=3) calibrator2.fit(X_train2, y_train2) probs2 = calibrator2.predict_proba(X_test2)[:, 1] brier2 = brier_score_loss(y_test2, probs2) prob_true2, prob_pred2 = calibration_curve(y_test2, probs2, n_bins=10) ece2 = np.abs(prob_true2 - prob_pred2).mean() print(f"Split 1: ECE = {ece1:.4f}, Brier = {brier1:.4f}") print(f"Split 2: ECE = {ece2:.4f}, Brier = {brier2:.4f}")
copy

When you compare calibration metrics like ECE across different train/test splits, you gain insight into the robustness of your calibration method. If the ECE values remain close, you can be more confident that your calibration will generalize to new data. However, if you observe large swings in ECE, it may indicate that your calibration is sensitive to the particular data split, possibly due to small sample sizes, data drift, or overfitting by the calibration method itself. Consistent calibration performance is especially important in applications where model confidence directly impacts decision-making.

1. What does high variability in ECE across different train/test splits suggest about your model's calibration?

2. How can you improve calibration stability when you notice high variability in ECE across splits?

question mark

What does high variability in ECE across different train/test splits suggest about your model's calibration?

正しい答えを選んでください

question mark

How can you improve calibration stability when you notice high variability in ECE across splits?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 3.  3

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 3.  3
some-alt