ROC Curve and AUC
To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.
-
True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:
TPR=TP+FNTP -
False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:
FPR=FP+TNFP
By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.
12345678910111213141516171819202122232425262728293031323334353637383940import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 6.25
ROC Curve and AUC
Свайпніть щоб показати меню
To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.
-
True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:
TPR=TP+FNTP -
False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:
FPR=FP+TNFP
By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.
12345678910111213141516171819202122232425262728293031323334353637383940import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.
Дякуємо за ваш відгук!