Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Reshaping Data with Pivot Functions | Feature Engineering and Data Transformation
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
R for Data Scientists

bookReshaping Data with Pivot Functions

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

bookReshaping Data with Pivot Functions

Deslize para mostrar o menu

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2
some-alt