Lernen Cross-Validation

Swipe um das Menü anzuzeigen

The train-test split has two drawbacks:

Less training data, which may reduce model quality;
Dependence on the random split, causing unstable results. To overcome this, we use cross-validation.

First, divide the entire dataset into 5 equal parts, known as folds.

Next, use one fold as the test set and combine the remaining folds to form the training set.

As in any evaluation process, the training set is used to train the model, while the test set is used to measure its performance.

The process is repeated so that each fold serves as the test set once, while the remaining folds form the training set.

Cross-validation produces multiple accuracy scores—one per split. Their mean represents the model’s average performance. In Python, this is computed with cross_val_score().

Note

You can choose any number of folds. For example, using 10 folds means training on 9 parts and testing on 1. This is set via the cv parameter in cross_val_score().


              1234567891011
            
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv')
# Assign X, y variables (X is already preprocessed and y is already encoded)
X, y = df.drop('species', axis=1), df['species']
# Print the cross-val scores and the mean for KNeighborsClassifier with 5 neighbors
scores = cross_val_score(KNeighborsClassifier(), X, y)
print(scores)
print(scores.mean())

Cross-validation is more reliable but slower, since the model is trained and evaluated n times. It is widely used in hyperparameter tuning, where cross-validation is repeated for each hyperparameter value—for example, testing multiple k values in k-NN. This helps choose the option that consistently performs best.

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 26

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Abschnitt 1. Kapitel 26