Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Cross-Validation and Scoring API | Model Selection and Evaluation Utilities
Mastering scikit-learn API and Workflows

bookCross-Validation and Scoring API

Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.

123456789101112131415161718192021
import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
copy

The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.

question mark

What is the primary purpose of using cross-validation in model evaluation with scikit-learn?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 1

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

What are some examples of when I should use a metric other than accuracy?

How do I specify a custom scoring function in cross_val_score?

Can you explain the difference between precision, recall, and F1 score?

bookCross-Validation and Scoring API

Scorri per mostrare il menu

Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.

123456789101112131415161718192021
import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
copy

The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.

question mark

What is the primary purpose of using cross-validation in model evaluation with scikit-learn?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 1
some-alt