Cross-Validation and Scoring API
Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.
123456789101112131415161718192021import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
What are some examples of when I should use a metric other than accuracy?
How do I specify a custom scoring function in cross_val_score?
Can you explain the difference between precision, recall, and F1 score?
Genial!
Completion tasa mejorada a 5.26
Cross-Validation and Scoring API
Desliza para mostrar el menú
Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.
123456789101112131415161718192021import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.
¡Gracias por tus comentarios!