Summary  
This chapter explains how to transform and mutate columns in a data frame by applying functions or calculations to add new columns or modify existing ones.  

General domain of usage  
Feature engineering in data science

In data science, you often need to create new columns or modify existing ones to prepare your dataset for modeling. This process is called **feature engineering**, and it can involve calculating ratios, categorizing continuous variables, or transforming data to better fit the requirements of statistical models. Deriving new features or recoding variables can help you capture important patterns, improve model performance, and make your data more interpretable.

library(dplyr)

# Sample data frame
data <- tibble(
  name = c("Alice", "Bob", "Carol", "Dan"),
  age = c(25, 32, 28, 40),
  income = c(50000, 60000, 52000, 70000)
)

# Create a new column: income per year of age
data <- data %>%
  mutate(income_per_age = income / age)

print(as.data.frame(data))

The `mutate()` function from the `dplyr` package lets you add new columns or change existing ones in a data frame. You provide the name of the new column on the left and a formula or function on the right. In the example, `income_per_age = income / age` creates a new column by dividing each person's income by their age. You can use arithmetic operations, built-in R functions, or even custom functions inside `mutate()`. The original columns remain unless you overwrite them by using the same column name.

Be careful when creating or modifying columns: if the new value is a single number, R will recycle it across all rows, which may not be what you want. Also, if you use the same column name in `mutate()`, you will overwrite the original values. Double-check your code to avoid unintentional data loss or errors.

Note

What happens if you assign a single value to a new column in mutate()?

Master practical data science in R by learning data cleaning, modeling, evaluation, and machine learning workflows through hands-on code. Build fluency with R syntax, functions, and outputs for real-world data science tasks.

Learn to wrangle, clean, and prepare data in R using practical, code-driven workflows.

Engineer features and reshape data for modeling using R’s tidyverse tools.

Fit, interpret, and use regression and classification models with R code.

Evaluate models and build simple machine learning pipelines in R.

Transforming and Mutating Columns