Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Classification Models | Section
Predictive Modeling with Tidymodels in R
Section 1. Chapter 6
single

single

bookBuilding Classification Models

Swipe to show menu

In many real-world scenarios, you are not only interested in predicting numeric values but also in making decisions or classifying observations into distinct groups. This is where classification problems come into play. A classification problem involves predicting a categorical outcome—such as whether an email is spam or not, or if a patient has a particular disease—based on a set of input features. The two most common types of classification models you will encounter are logistic regression and decision trees. Logistic regression is especially useful for binary classification tasks, while decision trees can handle both binary and multiclass problems, offering interpretable rules for decision making. Both of these models can be easily implemented in R using the Tidymodels suite, which provides a consistent interface for model specification, training, and evaluation.

12345678910111213141516171819202122232425262728
options(crayon.enabled = FALSE) library(tidymodels) # Load example data data(iris) # Convert Species to a binary outcome for demonstration iris_binary <- iris %>% filter(Species != "setosa") %>% mutate(Species = factor(Species)) # Split the data set.seed(123) iris_split <- initial_split(iris_binary, prop = 0.8) iris_train <- training(iris_split) iris_test <- testing(iris_split) # Specify a logistic regression model log_reg_spec <- logistic_reg() %>% set_engine("glm") %>% set_mode("classification") # Fit the model log_reg_fit <- log_reg_spec %>% fit(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris_train) # View the model summary summary(log_reg_fit$fit)
copy

To build a classification model in Tidymodels, you start by specifying the type of model you want to use, such as logistic_reg() for logistic regression. The model specification defines the algorithm and its settings, including the computational engine (like "glm" for generalized linear models) and the mode ("classification" or "regression"). Once specified, you fit the model to your training data using the fit() function, providing the formula and data. The output of a fitted logistic regression model includes estimated coefficients for each predictor, which represent the change in the log-odds of the outcome for a one-unit increase in the predictor, holding other variables constant. By examining the summary of the fitted model, you can interpret which features are most influential in predicting the class and the direction of their effects. This interpretability is one of the strengths of logistic regression in classification tasks.

Task

Swipe to start coding

Build and fit a decision tree classifier on the training data using Tidymodels.

  • Load the tidymodels package.
  • Define a decision tree model specification utilizing the decision_tree() function.
  • Set the model's engine to "rpart" utilizing the set_engine() function.
  • Set the mode to "classification" utilizing the set_mode() function.
  • Fit the model to the provided training data utilizing the fit() function. Use Species as the outcome and Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width as predictors.
  • Return the fitted model object.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 6
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt