Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте ROC Curve and AUC | Classification Metrics
Evaluation Metrics in Machine Learning

bookROC Curve and AUC

To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.

  • True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:

    TPR=TPTP+FN\text{TPR} = \frac{TP}{TP + FN}
  • False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:

    FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}

By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.

12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
copy

When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.

question mark

Which statement best describes the ROC curve and the interpretation of AUC in binary classification?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 4

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 6.25

bookROC Curve and AUC

Свайпніть щоб показати меню

To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.

  • True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:

    TPR=TPTP+FN\text{TPR} = \frac{TP}{TP + FN}
  • False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:

    FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}

By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.

12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
copy

When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.

question mark

Which statement best describes the ROC curve and the interpretation of AUC in binary classification?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 4
some-alt