Cross-Validation and Scoring API
Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.
123456789101112131415161718192021import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Чудово!
Completion показник покращився до 5.26
Cross-Validation and Scoring API
Свайпніть щоб показати меню
Cross-validation is a key technique for evaluating the performance of machine learning models in scikit-learn. Instead of assessing a model on a single train-test split, cross-validation splits the data into several parts, called "folds," and trains and tests the model multiple times, each time with a different fold held out for validation. This approach provides a more reliable estimate of model performance and helps reduce the risk of overfitting to a particular subset of the data. In scikit-learn, the cross_val_score function makes it simple to apply cross-validation to any estimator, including pipelines that combine preprocessing and modeling steps.
123456789101112131415161718192021import numpy as np from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score # Load example data X, y = load_iris(return_X_y=True) # Build a pipeline with preprocessing and a classifier pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression(max_iter=200, random_state=42)) ]) # Evaluate the pipeline using 5-fold cross-validation scores = cross_val_score(pipeline, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", np.mean(scores))
The cross_val_score function accepts a scoring parameter that lets you specify how model performance should be measured. By default, many classifiers use accuracy, but you can choose from a wide range of built-in metrics such as 'f1', 'precision', 'recall', 'roc_auc' (for binary classification), or 'neg_mean_squared_error' (for regression). To use a different metric, simply pass the desired metric's string identifier to the scoring argument. This flexibility allows you to tailor evaluation to the goals of your project, ensuring that the chosen metric aligns with the real-world requirements of your task.
Дякуємо за ваш відгук!