Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Building a Simple Machine Learning Workflow | Model Evaluation and Machine Learning Workflows
R for Data Scientists

bookBuilding a Simple Machine Learning Workflow

When you build machine learning models, it is crucial to ensure that every step — from data preprocessing to modeling — is organized and reproducible. Chaining these steps together in a single pipeline not only reduces the risk of errors but also makes your workflow easier to share and repeat. The tidymodels framework in R provides tools to combine preprocessing and modeling steps into one tidy pipeline, making your analysis more robust and transparent.

1234567891011121314151617
library(tidymodels) options(crayon.enabled = FALSE) # Define the model specification lm_spec <- linear_reg() %>% set_engine("lm") # Build the workflow lm_workflow <- workflow() %>% add_formula(mpg ~ disp + hp) %>% add_model(lm_spec) print(lm_workflow) # Fit the workflow to the data fit_lm <- lm_workflow %>% fit(data = mtcars) print(fit_lm)
copy

You begin by loading the tidymodels package, which provides the tools for building machine learning pipelines. The first step is to define a model specification using linear_reg(), which sets up a linear regression model. You then specify the computational engine with set_engine("lm"), ensuring that R’s built-in linear model engine is used.

Next, you create a workflow using workflow(). This workflow acts as a container for both your model and the data preprocessing steps. You add a formula with add_formula(mpg ~ disp + hp), which tells the workflow which variables to use as predictors and which as the outcome. The model specification is attached to the workflow with add_model(lm_spec).

Once the workflow is constructed, you fit it to the mtcars dataset using fit(data = mtcars). This step applies all preprocessing (in this case, formula parsing) and fits the model in one reproducible operation. When you print the workflow object, you see the structure of your pipeline, including the formula and model. Printing the fitted workflow reveals the results of the model fit, including coefficients and summary statistics.

This approach ensures that every step is recorded and can be easily repeated or modified as your analysis evolves.

Note
Note

Always ensure that your preprocessing steps are compatible with your chosen model. For instance, some models require all predictors to be numeric, so categorical variables must be converted to dummy variables before fitting. If you forget to include necessary preprocessing in your workflow, you may encounter errors or misleading results. Review your workflow and model requirements carefully to avoid these issues.

question mark

What is the primary purpose of a workflow in the tidymodels framework?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 3

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

bookBuilding a Simple Machine Learning Workflow

Pyyhkäise näyttääksesi valikon

When you build machine learning models, it is crucial to ensure that every step — from data preprocessing to modeling — is organized and reproducible. Chaining these steps together in a single pipeline not only reduces the risk of errors but also makes your workflow easier to share and repeat. The tidymodels framework in R provides tools to combine preprocessing and modeling steps into one tidy pipeline, making your analysis more robust and transparent.

1234567891011121314151617
library(tidymodels) options(crayon.enabled = FALSE) # Define the model specification lm_spec <- linear_reg() %>% set_engine("lm") # Build the workflow lm_workflow <- workflow() %>% add_formula(mpg ~ disp + hp) %>% add_model(lm_spec) print(lm_workflow) # Fit the workflow to the data fit_lm <- lm_workflow %>% fit(data = mtcars) print(fit_lm)
copy

You begin by loading the tidymodels package, which provides the tools for building machine learning pipelines. The first step is to define a model specification using linear_reg(), which sets up a linear regression model. You then specify the computational engine with set_engine("lm"), ensuring that R’s built-in linear model engine is used.

Next, you create a workflow using workflow(). This workflow acts as a container for both your model and the data preprocessing steps. You add a formula with add_formula(mpg ~ disp + hp), which tells the workflow which variables to use as predictors and which as the outcome. The model specification is attached to the workflow with add_model(lm_spec).

Once the workflow is constructed, you fit it to the mtcars dataset using fit(data = mtcars). This step applies all preprocessing (in this case, formula parsing) and fits the model in one reproducible operation. When you print the workflow object, you see the structure of your pipeline, including the formula and model. Printing the fitted workflow reveals the results of the model fit, including coefficients and summary statistics.

This approach ensures that every step is recorded and can be easily repeated or modified as your analysis evolves.

Note
Note

Always ensure that your preprocessing steps are compatible with your chosen model. For instance, some models require all predictors to be numeric, so categorical variables must be converted to dummy variables before fitting. If you forget to include necessary preprocessing in your workflow, you may encounter errors or misleading results. Review your workflow and model requirements carefully to avoid these issues.

question mark

What is the primary purpose of a workflow in the tidymodels framework?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 3
some-alt