Data Cleaning and Transformation
Svep för att visa menyn
When working with real-world datasets, you will often encounter messy or inconsistent data that must be cleaned and transformed before analysis. Common data cleaning tasks include renaming columns to more meaningful names; handling missing values to ensure accurate calculations; and recoding variables to standardize categories or create new ones. These steps are essential for making your data tidy and analysis-ready.
123456789101112131415161718# Load required libraries library(dplyr) # Example data frame df <- data.frame( id = 1:4, score = c(90, NA, 75, 88), group = c("A", "B", "A", "B") ) # Use mutate to create a new variable and replace NA values in 'score' df_clean <- df %>% mutate( score_clean = ifelse(is.na(score), 0, score), # Replace NA with 0 passed = score_clean >= 80 # Create new logical variable ) print(df_clean)
To reshape your data for different analysis needs, the tidyr package provides powerful tools. The pivot_longer function transforms data from a wide format, where columns represent variables, to a long format, where each row is an observation-variable pair. Conversely, pivot_wider converts long-format data back to wide format, spreading key-value pairs across multiple columns. These functions make it easy to tidy your data and prepare it for further analysis.
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal