Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Reshaping Data with Pivot Functions | Feature Engineering and Data Transformation
R for Data Scientists

bookReshaping Data with Pivot Functions

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 2

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

bookReshaping Data with Pivot Functions

Glissez pour afficher le menu

As you prepare data for analysis or modeling, you often encounter datasets that are not structured in the most useful way. Sometimes, data is in a wide format, where each variable gets its own column, but your analysis or model expects a long format, where observations are stacked in rows. Other times, you may need to go from long to wide for reporting or visualization. Pivoting data allows you to reshape datasets between these formats, making them easier to work with for different analytical tasks. This flexibility is essential when preparing features for machine learning, aggregating results, or visualizing trends over time.

12345678910111213141516171819202122232425262728
library(tidyr) library(dplyr) # Sample data in wide format scores <- data.frame( student = c("Alice", "Bob", "Carol"), math = c(90, 85, 88), english = c(95, 80, 92) ) print(scores) # Pivot from wide to long format scores_long <- pivot_longer( scores, cols = c(math, english), names_to = "subject", values_to = "score" ) print(as.data.frame(scores_long)) # Pivot back from long to wide format scores_wide <- pivot_wider( scores_long, id_cols = student, names_from = subject, values_from = score ) print(as.data.frame(scores_wide))
copy

When you use pivot_longer(), the cols argument specifies which columns to reshape into longer format. The names_to argument tells R what to call the new column that will contain the names of the original columns (like "subject" in the example). The values_to argument sets the name for the new column that will store the values from those columns (like "score"). For pivot_wider(), the id_cols argument identifies the columns that should remain as identifiers (such as "student"), while names_from and values_from decide which columns create new headers and which supply their values.

Note
Note

Be careful when pivoting data — if you have duplicate combinations of identifier columns and pivoted columns, you might lose data or get unexpected results. Also, duplicate column names can cause errors or overwrite data during the pivot process. Always check your data for unique identifiers before reshaping.

question mark

What does the pivot_longer function do in R?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 2
some-alt