Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Reshaping Data with Pivot Functions | Feature Engineering and Data Transformation
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
R for Data Scientists

bookReshaping Data with Pivot Functions

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 2

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain the difference between wide and long data formats in more detail?

How do I decide when to use pivot_longer() versus pivot_wider()?

Can you show more examples of reshaping data with tidyr?

bookReshaping Data with Pivot Functions

Sveip for å vise menyen

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 2
some-alt