Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Data Transformation | Data Manipulation and Cleaning
Data Analysis with R

bookData Transformation

Data transformation is a crucial step in preparing raw data for analysis. It involves modifying, adding, or recoding variables to make the data more meaningful and analysis-ready.

Creating New Columns

A common transformation is to calculate new metrics from existing columns. For example, you might want to calculate the price per kilometer to assess how cost-effective a vehicle is.

Base R

You can create a new column by using the $ operator to define its name and assigning values to it.

df$price_per_km <- df$selling_price / df$km_driven
head(df)

dplyr

New columns can be added using the mutate() function. Inside mutate(), you specify the name of the new column and define how it should be calculated.

df <- df %>%
  mutate(price_per_km = selling_price / km_driven)

Converting and Transforming Text-Based Numeric Data

In real-world datasets, numeric information is often stored as text combined with non-numeric characters. For example, engine power values might appear as "68 bhp", which must be cleaned and converted before analysis.

Base R

You can use gsub() to remove unwanted text and then apply as.numeric() to convert the result into numbers. After conversion, additional transformations can be performed, such as converting brake horsepower (bhp) into kilowatts.

df$max_power <- as.numeric(gsub(" bhp", "", df$max_power))
df$max_power_kw <- df$max_power * 0.7457  # convert to kilowatts

dplyr

The same process can be streamlined inside a mutate() call. You can combine text replacement, type conversion, and new column creation in a single step, which makes the code cleaner and easier to read.

df <- df %>%
  mutate(
    max_power = as.numeric(gsub(" bhp", "", max_power)),
    max_power_kw = max_power * 0.7457
  )

Categorizing Data

You can create new categorical variables by grouping continuous values into meaningful categories. For example, cars can be classified into Low, Medium, or High price ranges based on their selling price.

Base R

You can do this with nested ifelse() statements. Each condition is checked in order, and the value is assigned accordingly.

df$price_category <- ifelse(df$selling_price < 300000, "Low",
                            ifelse(df$selling_price < 700000, "Medium", "High"))

dplyr

You can use the case_when() function as a replacement for nested if-else statements. This allows multiple conditions to be written in a clean, readable format.

df <- df %>%
  mutate(price_category = case_when(
    selling_price < 300000 ~ "Low",
    selling_price < 700000 ~ "Medium",
    TRUE ~ "High"
  ))
question mark

What does mutate() do in dplyr?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 9

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the difference between using Base R and dplyr for data transformation?

How do I handle non-numeric values when converting columns for analysis?

Can you show more examples of categorizing data using different criteria?

Awesome!

Completion rate improved to 4

bookData Transformation

Swipe to show menu

Data transformation is a crucial step in preparing raw data for analysis. It involves modifying, adding, or recoding variables to make the data more meaningful and analysis-ready.

Creating New Columns

A common transformation is to calculate new metrics from existing columns. For example, you might want to calculate the price per kilometer to assess how cost-effective a vehicle is.

Base R

You can create a new column by using the $ operator to define its name and assigning values to it.

df$price_per_km <- df$selling_price / df$km_driven
head(df)

dplyr

New columns can be added using the mutate() function. Inside mutate(), you specify the name of the new column and define how it should be calculated.

df <- df %>%
  mutate(price_per_km = selling_price / km_driven)

Converting and Transforming Text-Based Numeric Data

In real-world datasets, numeric information is often stored as text combined with non-numeric characters. For example, engine power values might appear as "68 bhp", which must be cleaned and converted before analysis.

Base R

You can use gsub() to remove unwanted text and then apply as.numeric() to convert the result into numbers. After conversion, additional transformations can be performed, such as converting brake horsepower (bhp) into kilowatts.

df$max_power <- as.numeric(gsub(" bhp", "", df$max_power))
df$max_power_kw <- df$max_power * 0.7457  # convert to kilowatts

dplyr

The same process can be streamlined inside a mutate() call. You can combine text replacement, type conversion, and new column creation in a single step, which makes the code cleaner and easier to read.

df <- df %>%
  mutate(
    max_power = as.numeric(gsub(" bhp", "", max_power)),
    max_power_kw = max_power * 0.7457
  )

Categorizing Data

You can create new categorical variables by grouping continuous values into meaningful categories. For example, cars can be classified into Low, Medium, or High price ranges based on their selling price.

Base R

You can do this with nested ifelse() statements. Each condition is checked in order, and the value is assigned accordingly.

df$price_category <- ifelse(df$selling_price < 300000, "Low",
                            ifelse(df$selling_price < 700000, "Medium", "High"))

dplyr

You can use the case_when() function as a replacement for nested if-else statements. This allows multiple conditions to be written in a clean, readable format.

df <- df %>%
  mutate(price_category = case_when(
    selling_price < 300000 ~ "Low",
    selling_price < 700000 ~ "Medium",
    TRUE ~ "High"
  ))
question mark

What does mutate() do in dplyr?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 9
some-alt