Aprende Cross-Validation and Scoring API | Model Selection and Evaluation Utilities

Desliza para mostrar el menú

Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.


              123456789101112131415161718192021
            
import numpy as np
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# Load example data
X, y = load_iris(return_X_y=True)

# Build a pipeline with preprocessing and a classifier
pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression(max_iter=200, random_state=42))
])

# Evaluate the pipeline using 5-fold cross-validation
scores = cross_val_score(pipeline, X, y, cv=5)

print("Cross-validation scores:", scores)
print("Mean accuracy:", np.mean(scores))

The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 4. Capítulo 1

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 4. Capítulo 1