Summary  
This chapter demonstrates how to use a fluent API to build a reproducible modeling pipeline by defining a model specification, chaining preprocessing and modeling steps in a single workflow, and fitting it in one unified operation.

General domain of usage  
Machine learning

When you build machine learning models, it is crucial to ensure that every step — from data preprocessing to modeling — is organized and reproducible. Chaining these steps together in a single pipeline not only reduces the risk of errors but also makes your workflow easier to share and repeat. The `tidymodels` framework in R provides tools to combine preprocessing and modeling steps into one tidy pipeline, making your analysis more robust and transparent.

library(tidymodels)
options(crayon.enabled = FALSE)

# Define the model specification
lm_spec <- linear_reg() %>%
  set_engine("lm")

# Build the workflow
lm_workflow <- workflow() %>%
  add_formula(mpg ~ disp + hp) %>%
  add_model(lm_spec)
print(lm_workflow)

# Fit the workflow to the data
fit_lm <- lm_workflow %>%
  fit(data = mtcars)
print(fit_lm)

You begin by loading the `tidymodels` package, which provides the tools for building machine learning pipelines. The first step is to define a model specification using `linear_reg()`, which sets up a linear regression model. You then specify the computational engine with `set_engine("lm")`, ensuring that R’s built-in linear model engine is used.

Next, you create a workflow using `workflow()`. This workflow acts as a container for both your model and the data preprocessing steps. You add a formula with `add_formula(mpg ~ disp + hp)`, which tells the workflow which variables to use as predictors and which as the outcome. The model specification is attached to the workflow with `add_model(lm_spec)`.

Once the workflow is constructed, you fit it to the `mtcars` dataset using `fit(data = mtcars)`. This step applies all preprocessing (in this case, formula parsing) and fits the model in one reproducible operation. When you print the workflow object, you see the structure of your pipeline, including the formula and model. Printing the fitted workflow reveals the results of the model fit, including coefficients and summary statistics.

This approach ensures that every step is recorded and can be easily repeated or modified as your analysis evolves.

**Always ensure that your preprocessing steps are compatible with your chosen model.** For instance, some models require all predictors to be numeric, so categorical variables must be converted to dummy variables before fitting. If you forget to include necessary preprocessing in your workflow, you may encounter errors or misleading results. Review your workflow and model requirements carefully to avoid these issues.

Note

What is the primary purpose of a workflow in the `tidymodels` framework?

Master practical data science in R by learning data cleaning, modeling, evaluation, and machine learning workflows through hands-on code. Build fluency with R syntax, functions, and outputs for real-world data science tasks.

Learn to wrangle, clean, and prepare data in R using practical, code-driven workflows.

Engineer features and reshape data for modeling using R’s tidyverse tools.

Fit, interpret, and use regression and classification models with R code.

Evaluate models and build simple machine learning pipelines in R.

Building a Simple Machine Learning Workflow