Summary  
This chapter covers using scikit-learn’s metrics module to quantify model performance by computing accuracy scores and detailed classification reports (precision, recall, f1-score, support).

General domain of usage  
Classification model evaluation

When you train a machine learning model, you need to assess how well it performs. scikit-learn provides a dedicated `metrics` module that contains a variety of scoring functions for this purpose. These functions help you quantify the quality of your predictions, making it easier to compare models and tune parameters. For classification problems, two of the most commonly used metrics are `accuracy_score` and `classification_report`. The `accuracy_score` function measures the proportion of correct predictions, while the `classification_report` provides a detailed summary including precision, recall, f1-score, and support for each class.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load data and split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Build a pipeline
pipe = make_pipeline(StandardScaler(), LogisticRegression(max_iter=200))
pipe.fit(X_train, y_train)

# Predict and evaluate
y_pred = pipe.predict(X_test)
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", acc)
print("Classification Report:\n", report)

The output from the `accuracy_score` gives you a single value between 0 and 1, indicating the fraction of correct predictions. A value closer to 1 means the model is making more accurate predictions. The `classification_report` output is a table that breaks down performance for each class, showing **precision** (the proportion of positive identifications that were actually correct), **recall** (the proportion of actual positives that were correctly identified), and **f1-score** (the harmonic mean of precision and recall). The **support** column tells you how many samples belong to each class in the test set. By integrating these metrics into your workflow, you can systematically compare models, identify strengths and weaknesses, and make informed decisions about further tuning or feature engineering. These tools are essential for building robust machine learning solutions using scikit-learn.

What is the main difference between `accuracy_score` and `classification_report` in scikit-learn's metrics module?

Master the scikit-learn library by learning its API, core abstractions, and engineering patterns. Focus on syntax, structure, and workflow to confidently build, compose, and inspect machine learning pipelines using scikit-learn.

Explore the foundational Estimator API, understand fit/predict/transform, and learn the design philosophy behind scikit-learn’s consistent interface.

Dive into transformers, preprocessing objects, and how to build robust data pipelines using scikit-learn’s modular approach.

Master scikit-learn’s pipeline and composition tools to create maintainable, reproducible workflows.

Leverage scikit-learn’s model selection and evaluation tools for robust, reproducible workflows.

Learn to inspect, configure, and refactor scikit-learn objects for robust, maintainable code. Avoid common anti-patterns.

Metrics and Model Evaluation