Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Transforming and Mutating Columns | Data Preparation and Cleaning
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
R for Data Scientists

bookTransforming and Mutating Columns

In data science, you often need to create new columns or modify existing ones to prepare your dataset for modeling. This process is called feature engineering, and it can involve calculating ratios, categorizing continuous variables, or transforming data to better fit the requirements of statistical models. Deriving new features or recoding variables can help you capture important patterns, improve model performance, and make your data more interpretable.

1234567891011121314
library(dplyr) # Sample data frame data <- tibble( name = c("Alice", "Bob", "Carol", "Dan"), age = c(25, 32, 28, 40), income = c(50000, 60000, 52000, 70000) ) # Create a new column: income per year of age data <- data %>% mutate(income_per_age = income / age) print(as.data.frame(data))
copy

The mutate() function from the dplyr package lets you add new columns or change existing ones in a data frame. You provide the name of the new column on the left and a formula or function on the right. In the example, income_per_age = income / age creates a new column by dividing each person's income by their age. You can use arithmetic operations, built-in R functions, or even custom functions inside mutate(). The original columns remain unless you overwrite them by using the same column name.

Note
Note

Be careful when creating or modifying columns: if the new value is a single number, R will recycle it across all rows, which may not be what you want. Also, if you use the same column name in mutate(), you will overwrite the original values. Double-check your code to avoid unintentional data loss or errors.

question mark

What happens if you assign a single value to a new column in mutate()?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 4

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you show more examples of feature engineering in R?

What are some best practices for creating new features?

How can I handle missing values when creating new columns?

bookTransforming and Mutating Columns

Svep för att visa menyn

In data science, you often need to create new columns or modify existing ones to prepare your dataset for modeling. This process is called feature engineering, and it can involve calculating ratios, categorizing continuous variables, or transforming data to better fit the requirements of statistical models. Deriving new features or recoding variables can help you capture important patterns, improve model performance, and make your data more interpretable.

1234567891011121314
library(dplyr) # Sample data frame data <- tibble( name = c("Alice", "Bob", "Carol", "Dan"), age = c(25, 32, 28, 40), income = c(50000, 60000, 52000, 70000) ) # Create a new column: income per year of age data <- data %>% mutate(income_per_age = income / age) print(as.data.frame(data))
copy

The mutate() function from the dplyr package lets you add new columns or change existing ones in a data frame. You provide the name of the new column on the left and a formula or function on the right. In the example, income_per_age = income / age creates a new column by dividing each person's income by their age. You can use arithmetic operations, built-in R functions, or even custom functions inside mutate(). The original columns remain unless you overwrite them by using the same column name.

Note
Note

Be careful when creating or modifying columns: if the new value is a single number, R will recycle it across all rows, which may not be what you want. Also, if you use the same column name in mutate(), you will overwrite the original values. Double-check your code to avoid unintentional data loss or errors.

question mark

What happens if you assign a single value to a new column in mutate()?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 4
some-alt