Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Data Splitting and Resampling | Section
Predictive Modeling with Tidymodels in R

bookData Splitting and Resampling

Veeg om het menu te tonen

When building predictive models, you must ensure that your model can generalize well to new, unseen data. This is where data splitting becomes crucial. By dividing your dataset into separate training and testing sets, you can train your model on one portion of the data and evaluate its performance on another. This helps prevent overfitting, where a model learns the training data too well and fails to perform on new data. Data splitting provides a realistic estimate of how your model will behave in real-world scenarios.

1234567891011121314151617
options(crayon.enabled = FALSE) library(tidymodels) # Load example dataset data(ames, package = "modeldata") # Split the data: 80% for training, 20% for testing set.seed(123) data_split <- initial_split(ames, prop = 0.8) # Extract training and testing sets train_data <- training(data_split) test_data <- testing(data_split) # Check the number of rows in each set nrow(train_data) nrow(test_data)
copy

After splitting your data, you often want to further validate your model by using resampling methods. Tidymodels provides tools for techniques like cross-validation and bootstrapping.

  • Cross-validation involves dividing your training data into several folds;
  • Training the model on subsets, and validating it on the remaining fold;
  • This process is repeated so every fold serves as a validation set once.

Bootstrapping, on the other hand, generates multiple samples from the training data (with replacement) to estimate the variability in your model's performance. Both methods help you assess model stability and ensure your results are not due to a particular split of the data.

question mark

Why is it important to split your data and use resampling techniques when building predictive models?

Selecteer het correcte antwoord

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 1
some-alt