Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Metrics and Model Evaluation | Model Selection and Evaluation Utilities
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Mastering scikit-learn API and Workflows

bookMetrics and Model Evaluation

When you train a machine learning model, you need to assess how well it performs. scikit-learn provides a dedicated metrics module that contains a variety of scoring functions for this purpose. These functions help you quantify the quality of your predictions, making it easier to compare models and tune parameters. For classification problems, two of the most commonly used metrics are accuracy_score and classification_report. The accuracy_score function measures the proportion of correct predictions, while the classification_report provides a detailed summary including precision, recall, f1-score, and support for each class.

1234567891011121314151617181920212223
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Load data and split X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Build a pipeline pipe = make_pipeline(StandardScaler(), LogisticRegression(max_iter=200)) pipe.fit(X_train, y_train) # Predict and evaluate y_pred = pipe.predict(X_test) acc = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred) print("Accuracy:", acc) print("Classification Report:\n", report)
copy

The output from the accuracy_score gives you a single value between 0 and 1, indicating the fraction of correct predictions. A value closer to 1 means the model is making more accurate predictions. The classification_report output is a table that breaks down performance for each class, showing precision (the proportion of positive identifications that were actually correct), recall (the proportion of actual positives that were correctly identified), and f1-score (the harmonic mean of precision and recall). The support column tells you how many samples belong to each class in the test set. By integrating these metrics into your workflow, you can systematically compare models, identify strengths and weaknesses, and make informed decisions about further tuning or feature engineering. These tools are essential for building robust machine learning solutions using scikit-learn.

question mark

What is the main difference between accuracy_score and classification_report in scikit-learn's metrics module?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookMetrics and Model Evaluation

Swipe to show menu

When you train a machine learning model, you need to assess how well it performs. scikit-learn provides a dedicated metrics module that contains a variety of scoring functions for this purpose. These functions help you quantify the quality of your predictions, making it easier to compare models and tune parameters. For classification problems, two of the most commonly used metrics are accuracy_score and classification_report. The accuracy_score function measures the proportion of correct predictions, while the classification_report provides a detailed summary including precision, recall, f1-score, and support for each class.

1234567891011121314151617181920212223
import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Load data and split X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Build a pipeline pipe = make_pipeline(StandardScaler(), LogisticRegression(max_iter=200)) pipe.fit(X_train, y_train) # Predict and evaluate y_pred = pipe.predict(X_test) acc = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred) print("Accuracy:", acc) print("Classification Report:\n", report)
copy

The output from the accuracy_score gives you a single value between 0 and 1, indicating the fraction of correct predictions. A value closer to 1 means the model is making more accurate predictions. The classification_report output is a table that breaks down performance for each class, showing precision (the proportion of positive identifications that were actually correct), recall (the proportion of actual positives that were correctly identified), and f1-score (the harmonic mean of precision and recall). The support column tells you how many samples belong to each class in the test set. By integrating these metrics into your workflow, you can systematically compare models, identify strengths and weaknesses, and make informed decisions about further tuning or feature engineering. These tools are essential for building robust machine learning solutions using scikit-learn.

question mark

What is the main difference between accuracy_score and classification_report in scikit-learn's metrics module?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
some-alt